OAK

GIST Library Login

GIST Scholar College of Information and Computing Department of Electrical Engineering and Computer Science 1. Journal Articles

Improved Speech Enhancement Considering Speech PSD Uncertainty

Metadata Downloads

Author(s): Kim, M.; Shin, Jong Won

Type: Article

Citation: IEEE/ACM Transactions on Audio Speech and Language Processing, v.30, pp.1939 - 1951

Issued Date: 2022-07

Abstract: Speech enhancement based on statistical models has been studied for several decades. Recently, the speech enhancement adopting a speech power spectral density (PSD) uncertainty model has been proposed. This approach distinguishes the true speech PSD from its estimate and considers both as random variables. It incorporates a prior distribution of speech spectra and speech PSD estimators to derive the PSD uncertainty-aware counterpart to conventional clean speech estimators, which results in performance improvement. However, the speech PSD uncertainty model has not yet been adopted for parameter estimations such as speech presence probability, noise PSD, and speech power spectra estimations in the speech enhancement framework. In this paper, we incorporate the speech PSD uncertainty model to all the components of the statistical model-based speech enhancement framework by deriving PSD uncertainty-aware counterparts to conventional parameter estimators. Specifically, we derive the speech presence probability (SPP) where the likelihood function for each hypothesis is based on the speech PSD uncertainty. With this SPP, a novel SPP-based noise PSD estimator is derived. Also, we derive the minimum mean-square error (MMSE) estimator for the power spectrum of the clean speech in the current frame under speech PSD uncertainty which is exploited to refine the speech PSD estimator. Finally, the refined speech PSD estimator is incorporated into the spectral gain function based on the speech PSD uncertainty model. The proposed approach showed improved noise PSD estimation performance in terms of the averaged logarithmic error distance, and improved speech enhancement performance in terms of the noise reduction, segmental signal-to-noise ratio, perceptual evaluation of speech quality (PESQ) scores and short-time objective intelligibility in our experiments. It also exhibited comparable performance with a real-time deep learning-based speech enhancement system in terms of the PESQ scores and composite measures for the VoiceBank-DEMAND dataset. IEEE

Publisher: Institute of Electrical and Electronics Engineers Inc.

ISSN: 2329-9290

DOI: 10.1109/TASLP.2022.3180676

URI: https://scholar.gist.ac.kr/handle/local/10740

Appears in Collections:: Department of Electrical Engineering and Computer Science > 1. Journal Articles

메타데이터 간략히 보기메타데이터 전체 보기

공개 및 라이선스

공개 구분공개

qrcode

트윗하기

OAK GIST Scholar는 국립중앙도서관 OAK Repository 보급사업으로 구축되었습니다.