Dual-microphone voice activity detection incorporating Gaussian mixture models with an error correction scheme in non-stationary noise environments
- Abstract
- In this paper, a voice activity detection (VAD) method is proposed based on Gaussian mixture models (GMMs) by exploiting the spatial selectivity in dual-microphone environments. In other words, each GMM is constructed according to the direction-of-arrival (DOA) to detect speech intervals. Based on the assumption that the target speech is located in front of dual-microphones, the VAD is performed by comparing the likelihood obtained from the GMM constructed for the front of the microphones with those obtained from GMMs for other DOAs. In addition, to mitigate false rejection errors of VAD arising from the low spatial correlation in unvoiced intervals of target speech, VAD results are refined by employing a VAD error correction scheme. The error correction scheme analyzes the ratio between the energy of high and low frequency bands (HILO) to discriminate between an unvoiced interval of speech and a non-speech interval. The performance of the proposed GMM-based VAD method with the HILO-based error correction scheme is evaluated by measuring the false alarm rate (FAR) and false rejection rate (FRR) and comparing them with those of conventional dual-microphone VAD methods, where the FAR and FRR are measured by comparing the VAD results of each VAD method with those of manual segmentation. It is shown from the evaluation that the proposed GMM-based VAD method with the HILO-based VAD error correction outperforms a Gaussian kernel density-based VAD method and a GMM-based VAD method without VAD correction. © 2013 ICIC International.
- Author(s)
- Park, J.H.; Kim, Hong Kook
- Issued Date
- 2013-01
- Type
- Article
- URI
- https://scholar.gist.ac.kr/handle/local/15688
- 공개 및 라이선스
-
- 파일 목록
-
Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.