Audio enhancement using local SNR-based sparse binary mask estimation and spectral imputation
- Abstract
- This paper proposes a method for enhancing speech and/or audio quality under noisy conditions. The proposed method first estimates the local signal-to-noise ratio (SNR) of the noisy input signal via sparse non-negative matrix factorization (SNMF). Next, a sparse binary mask (SBM) is proposed that separates the audio signal from the noise by measuring the sparsity of the pool of local SNRs from the adjacent frequency bands of the current and several previous frames. However, some spectral gaps remain across frequency bands after applying the binary masks, which distorts the separated audio signal due to spectral discontinuity. Thus, a spectral imputation technique is used to fill the empty spectrum of the frequency band where it is removed by the SBM. Spectral imputation is conducted by online learning NMF with the spectra of the neighboring non-overlapped frequency bands and their local sparsity. The effectiveness of the proposed enhancement method is demonstrated on two different tasks use speech and musical content, respectively. Consequently, objective measurements and subjective listening tests show that the proposed method outperforms conventional speech and audio enhancement methods, such as SNMF-based alternatives and deep recurrent neural networks for speech enhancement, block thresholding, and a commercially available software tool for audio enhancement. (C) 2017 Elsevier Inc. All rights reserved.
- Author(s)
- Jeon, Kwang Myung; Kim, Hong Kook
- Issued Date
- 2017-09
- Type
- Article
- DOI
- 10.1016/j.dsp.2017.06.001
- URI
- https://scholar.gist.ac.kr/handle/local/13618
- 공개 및 라이선스
-
- 파일 목록
-
Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.