OAK

Improved Speech Enhancement Considering Speech PSD Uncertainty

Metadata Downloads
Author(s)
Kim, M.Shin, Jong Won
Type
Article
Citation
IEEE/ACM Transactions on Audio Speech and Language Processing, v.30, pp.1939 - 1951
Issued Date
2022-07
Abstract
Speech enhancement based on statistical models has been studied for several decades. Recently, the speech enhancement adopting a speech power spectral density (PSD) uncertainty model has been proposed. This approach distinguishes the true speech PSD from its estimate and considers both as random variables. It incorporates a prior distribution of speech spectra and speech PSD estimators to derive the PSD uncertainty-aware counterpart to conventional clean speech estimators, which results in performance improvement. However, the speech PSD uncertainty model has not yet been adopted for parameter estimations such as speech presence probability, noise PSD, and speech power spectra estimations in the speech enhancement framework. In this paper, we incorporate the speech PSD uncertainty model to all the components of the statistical model-based speech enhancement framework by deriving PSD uncertainty-aware counterparts to conventional parameter estimators. Specifically, we derive the speech presence probability (SPP) where the likelihood function for each hypothesis is based on the speech PSD uncertainty. With this SPP, a novel SPP-based noise PSD estimator is derived. Also, we derive the minimum mean-square error (MMSE) estimator for the power spectrum of the clean speech in the current frame under speech PSD uncertainty which is exploited to refine the speech PSD estimator. Finally, the refined speech PSD estimator is incorporated into the spectral gain function based on the speech PSD uncertainty model. The proposed approach showed improved noise PSD estimation performance in terms of the averaged logarithmic error distance, and improved speech enhancement performance in terms of the noise reduction, segmental signal-to-noise ratio, perceptual evaluation of speech quality (PESQ) scores and short-time objective intelligibility in our experiments. It also exhibited comparable performance with a real-time deep learning-based speech enhancement system in terms of the PESQ scores and composite measures for the VoiceBank-DEMAND dataset. IEEE
Publisher
Institute of Electrical and Electronics Engineers Inc.
ISSN
2329-9290
DOI
10.1109/TASLP.2022.3180676
URI
https://scholar.gist.ac.kr/handle/local/10740
공개 및 라이선스
  • 공개 구분공개
파일 목록
  • 관련 파일이 존재하지 않습니다.

Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.