Language model adaptation via minimum discrimination information

P. Srinivasa Rao; M.D. Monkowski; Salim Roukos

ICASSP 1995

Conference paper

09 May 1995

Language model adaptation via minimum discrimination information

Abstract

Statistical language models improve the performance of speech recognition systems by providing estimates of a priori probabilities of word sequences. The commonly used trigram language models obtain the conditional probability estimate of a word given the previous two words, from a large corpus of text. The text corpus is often a collection of several small diverse segments such as newspaper articles, or conversations on different topics. Knowledge of the current topic could be utilized to adapt the general trigram language models to match that topic closely. For example, an interpolation of the general language model with one built on the topic data could be used. We first discuss the adaptation of general trigram language models to a known topic using the minimum discrimination information (MDI) method. We then present results on the Switch-board corpus which consists of telephone conversations on several topics.

Conference paper