OAK

Target exaggeration for deep learning-based speech enhancement

Metadata Downloads
Abstract
Deep learning has been actively utilized for speech enhancement. However, deep learning-based speech enhancement usually produces over-smoothed speech, resulting in speech distortion and degraded intelligibility. In this paper, we propose the exaggeration of the training target so that the dynamic range of the enhanced speech becomes more similar to that of the clean speech. Target exaggeration can be implemented in two ways. The first approach is to exaggerate the target feature in the cost function of a deep learning-based speech enhancement system. This method can be implemented without additional parameters or computation, but can only be applied to schemes working in the time-frequency domain with the mean-square error cost function. The second approach is to introduce an additional deep neural network (DNN) that estimates the residual error in the output of a deep learning-based speech enhancement. This requires more computation, but can be applied even to time-domain approaches. To evaluate the performance of the proposed target exaggeration, it is applied to a feed-forward DNN-and long short-term memory (LSTM)-based speech enhancement scheme in the time-frequency domain, and the convolutional time-domain audio separation network (Conv-TasNet)-based speech enhancement scheme in the time domain. Experimental results showed that the proposed method improved the quality of speech produced by the deep learning-based speech enhancement system in terms of the perceptual evaluation of speech quality (PESQ) scores and outperformed other approaches, including global variance equalization and a perceptually optimized speech denoising autoencoder, to alleviate the over-smoothing problem. (C) 2021 Elsevier Inc. All rights reserved.
Author(s)
Kim, HansolShin, Jong Won
Issued Date
2021-09
Type
Article
DOI
10.1016/j.dsp.2021.103109
URI
https://scholar.gist.ac.kr/handle/local/11327
Publisher
Elsevier Inc.
Citation
DIGITAL SIGNAL PROCESSING, v.116
ISSN
1051-2004
Appears in Collections:
Department of Electrical Engineering and Computer Science > 1. Journal Articles
공개 및 라이선스
  • 공개 구분공개
파일 목록
  • 관련 파일이 존재하지 않습니다.

Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.