OAK

Dual Microphone Voice Activity Detection Based on Reliable Spatial Cues

Metadata Downloads
Abstract
Two main spatial cues that can be exploited for dual microphone voice activity detection (VAD) are the interchannel time difference (ITD) and the interchannel level difference (ILD). While both ITD and ILD provide information on the location of audio sources, they may be impaired in different manners by background noises and reverberation and therefore can have complementary information. Conventional approaches utilize the statistics from all frequencies with fixed weight, although the information from some time-frequency bins may degrade the performance of VAD. In this letter, we propose a dual microphone VAD scheme based on the spatial cues in reliable frequency bins only, considering the sparsity of the speech signal in the time-frequency domain. The reliability of each time-frequency bin is determined by three conditions on signal energy, ILD, and ITD. ITD-based and ILD-based VADs and statistics are evaluated using the information from selected frequency bins and then combined to produce the final VAD results. Experimental results show that the proposed frequency selective approach enhances the performances of VAD in realistic environments.
Author(s)
Hwang, SoojoongJin, Yu GwangShin, Jong Won
Issued Date
2019-07
Type
Article
DOI
10.3390/s19143056
URI
https://scholar.gist.ac.kr/handle/local/12622
Publisher
MDPI
Citation
SENSORS, v.19, no.14
ISSN
1424-8220
Appears in Collections:
Department of Electrical Engineering and Computer Science > 1. Journal Articles
공개 및 라이선스
  • 공개 구분공개
파일 목록

Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.