PS-ZCPA based feature extraction with auditory masking, modulation enhancement and noise reduction for robust ASR

Muhammad Ghulam; Takashi Fukuda; Kouichi Katsurada; Junsei Horikawa; Tsuneo Nitta

doi:10.1093/ietisy/e89-d.3.1015

IEICE Transactions on Information and Systems

Paper

01 Jan 2006

PS-ZCPA based feature extraction with auditory masking, modulation enhancement and noise reduction for robust ASR

View publication

Abstract

A pitch-synchronous (PS) auditory feature extraction method based on ZCPA (Zero-Crossings Peak-Amplitudes) was proposed previously and showed more robustness over a conventional ZCPA and MFCC based features. In this paper, firstly, a non-linear adaptive threshold adjustment procedure is introduced into the PS-ZCPA method to get optimal results in noisy conditions with different signal-to-noise ratio (SNR). Next, auditory masking, a well-known auditory perception, and modulation enhancement that simulates a strong relationship between modulation spectrums and intelligibility of speech are embedded into the PS-ZCPA method. Finally, a Wiener filter based noise reduction procedure is integrated into the method to make it more noise-robust, and the performance is evaluated against ETSI ES202 (WI008), which is a standard front-end for distributed speech recognition. All the experiments were carried out on Aurora-2J database. The experimental results demonstrated improved performance of the PS-ZCPA method by embedding auditory masking into it, and a slightly improved performance by using modulation enhancement. The PS-ZCPA method with Wiener filter based noise reduction also showed better performance than ETSI ES202 (WI008). Copyright © 2006 The Institute of Electronics, Information and Communication Engineers.

Invited talk