L.R. Bahl, S. Balakrishnan-Aiyer, et al.
ICASSP 1995
Previous work addressing the issue of word distribution in documents has shown the importance of word repetitiveness as an indicator of the word content-bearing characteristics. In this paper we propose a simple method using a measure of the tendency of words to repeat within a document to separate the words with similar document frequencies, but different topic discriminating characteristics. We describe the application of the new measure in query-document relevance scoring. Experiments on the TREC Ad Hoc and Spoken Document Retrieval tasks show useful performance improvements.
L.R. Bahl, S. Balakrishnan-Aiyer, et al.
ICASSP 1995
Y. Al-Onaizan, R. Florian, et al.
NAACL-HLT 2003
Y. Al-Onaizan, R. Florian, et al.
NAACL-HLT 2003
A. Aaron, S. Chen, et al.
ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings