GMM-based dual-channel voice activity detection with a correction scheme
- Abstract
- In this paper, a voice activity detection (VAD) method is proposed on the basis of Gaussian mixture models (GMMs) constructed by spatial cues and a logarithmic root mean squared energy in a dual-channel environment. Each GMM is constructed according to the direction-of-arrival to detect speech intervals under the assumption that the target speech is located in front of dual-channel microphones. In addition, to reduce VAD errors, especially for unvoiced intervals of target speech, a VAD correction scheme is incorporated using the ratio between the energy of high and low frequency bands (HILO). In order to evaluate the performance of the proposed VAD method, the false rejection rates and false alarm rates are measured by comparing the VAD results of the proposed VAD with those of manual segmentation. As a result, it is shown that the proposed GMM-based VAD method with HILO-based VAD correction outperforms a Gaussian kernel density-based VAD method and a GMM-based VAD method without VAD correction. © 2012 ICIC International.
- Author(s)
- Park, J.H.; Kim, Hong Kook
- Issued Date
- 2012-02
- Type
- Article
- URI
- https://scholar.gist.ac.kr/handle/local/16038
- 공개 및 라이선스
-
- 파일 목록
-
Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.