Use of statistical N-gram models in natural language generation for machine translation

Fu-Hua Liu; Liang Gu; Yuqing Gao; Michael Picheny

ICASSP 2003

Conference paper

25 Sep 2003

Use of statistical N-gram models in natural language generation for machine translation

Abstract

Various language modeling issues in a speech-to-speech translation system are described in this paper. First, the language models for the speech recognizer need to be adapted to the specific domain to improve the recognition performance for in-domain utterances, while keeping the domain coverage as broad as possible. Second, when a maximum entropy based statistical natural language generation model is used to generate target language sentence as the translation output, serious inflection and synonym issues arise, because the compromised solution is used in semantic representation to avoid data sparseness problem. We use N-gram models as a post-processing step to enhance the generation performance. When an interpolated language model is applied to a Chinese-to-English translation task, the translation performance, measured by an objective metric of BLEU, improves substantially to 0.514 from 0.318 when we use the correct transcription as input. Similarly, the BLEU score is improved to 0.300 from 0.194 for the same task when the input is speech data.

Conference paper