Improved voice activity detection using static harmonic features
Takashi Fukuda, Osamu Ichikawa, et al.
ICASSP 2010
This paper proposes a novel weighting algorithm for Cross-power Spectrum Phase (CSP) analysis to improve the accuracy of direction of arrival (DOA) estimation for beamforming in a noisy environment. Our sound source is a human speaker and the noise is broadband noise in an automobile. The harmonic structures in the human speech spectrum can be used for weighting the CSP analysis, because harmonic bins must contain more speech power than the others and thus give us more reliable information. However, most conventional methods leveraging harmonic structures require pitch estimation with voiced-unvoiced classification, which is not sufficiently accurate in noisy environments. In our new approach, the observed power spectrum is directly converted into weights for the CSP analysis by retaining only the local peaks considered to be harmonic structures. Our experiment showed the proposed approach significantly reduced the errors in localization, and it showed further improvements when used with other weighting algorithms. © 2010 Osamu Ichikawa et al.
Takashi Fukuda, Osamu Ichikawa, et al.
ICASSP 2010
Nobuyasu Itoh, Gakuto Kurata, et al.
INTERSPEECH 2015
Tetsuya Takiguchi, Masafumi Nishimura
ICASSP 2004
Gakuto Kurata, Shinsuke Mori, et al.
ICASSP 2006