Exploiting unlabeled data using multiple classifiers for improved natural language call-routing

Ruhi Sarikaya; Hong-Kwang Jeff Kuo; Vaibhava Goel; Yuqing Gao

INTERSPEECH - Eurospeech 2005

Conference paper

01 Dec 2005

Exploiting unlabeled data using multiple classifiers for improved natural language call-routing

Abstract

This paper presents an unsupervised method that uses limited amount of labeled data and a large pool of unlabeled data to improve natural language call routing performance. The method uses multiple classifiers to select a subset of the unlabeled data to augment limited labeled data. We evaluated four widely used text classification algorithms; Naive Bayes Classification (NBC), Support Vector machines (SVM), Boosting and Maximum Entropy (MaxEnt). The NBC method is found to be poorest performer compared to other three classification methods. Combining SVM, Boosting and MaxEnt resulted in significant improvements in call classification accuracy compared to any single classifier performance across varying amounts of labeled data.

Conference paper