Jingrui He, Yada Zhu
ICDM 2012
Virtually all work on topic modeling has assumed that the topics are to be learned over a text-based document corpus. However, there exist important applications where topic models must be learned over an audio corpus of spoken language. Unfortunately, speech-to-text programs can have very low accuracy. We therefore propose a novel topic model for spoken language that incorporates a statistical model of speech-to-text software behavior. Crucially, our model exploits the uncertainty numbers returned by the software. Our ideas apply to any domain in which it would be useful to build a topic model over data in which uncertainties are explicitly represented. © 2012 IEEE.
Jingrui He, Yada Zhu
ICDM 2012
Ihab F. Ilyas, Volker Markl, et al.
SIGMOD 2004
Jia Zou, Arun Iyengar, et al.
VLDB 2017
Jia Zou, Amitabh Das, et al.
VLDB