Miroslav Novak
INTERSPEECH - Eurospeech 2005
This paper presents an unsupervised method that uses limited amount of labeled data and a large pool of unlabeled data to improve natural language call routing performance. The method uses multiple classifiers to select a subset of the unlabeled data to augment limited labeled data. We evaluated four widely used text classification algorithms; Naive Bayes Classification (NBC), Support Vector machines (SVM), Boosting and Maximum Entropy (MaxEnt). The NBC method is found to be poorest performer compared to other three classification methods. Combining SVM, Boosting and MaxEnt resulted in significant improvements in call classification accuracy compared to any single classifier performance across varying amounts of labeled data.
Miroslav Novak
INTERSPEECH - Eurospeech 2005
Hagen Soltau, George Saon, et al.
IEEE Transactions on Audio, Speech and Language Processing
Ruhi Sarikaya, Yuqing Gao, et al.
ICASSP 2004
Youssef Mroueh, Etienne Marcheret, et al.
AISTATS 2017