Michael Picheny, Zoltan Tuske, et al.
INTERSPEECH 2019
In this paper, we investigate the use of TemPoRal PatternS (TRAPS) classifiers for estimating manner of articulation features on the small-vocabulary Aurora-2002 database. By combining a stream of TRAPS-estimated manner features with a stream of noise-robust MFCC features (earlier proposed in the Aurora-2002 evaluation by OGI, ICSI and Qualcomm), we obtain an average absolute improvement of 0.4% to 1.0% in word recognition accuracy over noiserobust MFCC baseline features on Aurora tasks. This yields an average relative improvement of 54% over the reference end-pointed MFCC baseline. Estimation of the manner features can be performed on the server without increasing the terminal-side computational complexity in a distributed speech recognition (DSR) system.
Michael Picheny, Zoltan Tuske, et al.
INTERSPEECH 2019
Hagen Soltau, George Saon, et al.
IEEE Transactions on Audio, Speech and Language Processing
Jennifer C. Lai, Kwan Min Lee
ICSLP 2002
Jing Huang, Brian Kingsbury
ICASSP 2013