George Saon
SLT 2014
We present the IBM speech activity detection system that was fielded in the phase 2 evaluation of the DARPA RATS (robust automatic transcription of speech) program. Key ingredients of the system are: multi-pass HMM Viterbi segmentation, fusion of multiple feature streams, file-based and speech-based normalization schemes, the use of regular and convolutional deep neural networks, and model fusion through frame-level score combination of channel-dependent models. These techniques were instrumental in achieving a 1.4% equal error rate on the RATS phase 2 evaluation data. Copyright © 2013 ISCA.
George Saon
SLT 2014
Charles Wieeha, Pedro Szekely
CHI EA 2001
Shay Maymon, Etienne Marcheret, et al.
INTERSPEECH 2013
D. Oliveira, R. Silva Ferreira, et al.
EAGE/PESGB Workshop Machine Learning 2018