Murat Saraclar, Abhinav Sethy, et al.
ASRU 2013
Neural Network (NN) Acoustic Models (AMs) are usually trained using context-dependent Hidden Markov Model (CD-HMM) states as independent targets. For example, the CD-HMM states of A-b-2 (second variant of beginning state of A) and A-m-1 (first variant of middle state of A) both correspond to the phone A, and A-b-1 and A-b-2 both correspond to the Context-independent HMM (CI-HMM) state A-b, but this relationship is not explicitly modeled. We propose a method that treats some neurons in the final hidden layer just below the output layer as dedicated neurons for phones or CI-HMM states by initializing connections between the dedicated neurons and the corresponding CD-HMM outputs with stronger weights than to other outputs. We obtained 6.5% and 3.6% relative er-ror reductions with a DNN AM and a CNN AM, respectively, on a 50-hour English broadcast news task and 4.6% reduction with a CNN AM on a 500-hour Japanese task, in all cases af-ter Hessian-free sequence training. Our proposed method only changes the NN parameter initialization and requires no addi-tional computation in NN training or speech recognition run-time.
Murat Saraclar, Abhinav Sethy, et al.
ASRU 2013
George Saon, Samuel Thomas, et al.
INTERSPEECH 2013
Michael Picheny, Zoltan Tuske, et al.
INTERSPEECH 2019
George Saon, Tom Sercu, et al.
INTERSPEECH 2016