C. Neti, Salim Roukos
ASRU 1997
Statistical language models improve the performance of speech recognition systems by providing estimates of a priori probabilities of word sequences. The commonly used trigram language models obtain the conditional probability estimate of a word given the previous two words, from a large corpus of text. The text corpus is often a collection of several small diverse segments such as newspaper articles, or conversations on different topics. Knowledge of the current topic could be utilized to adapt the general trigram language models to match that topic closely. For example, an interpolation of the general language model with one built on the topic data could be used. We first discuss the adaptation of general trigram language models to a known topic using the minimum discrimination information (MDI) method. We then present results on the Switch-board corpus which consists of telephone conversations on several topics.
C. Neti, Salim Roukos
ASRU 1997
Martin Franz, Salim Roukos
SIGIR Forum (ACM Special Interest Group on Information Retrieval)
F. Jelinek, R.L. Mercer, et al.
ICASSP 1990
R. Sarikaya, S. Maskey, et al.
INTERSPEECH 2009