OAK

DNN-based monaural speech enhancement with temporal and spectral variations equalization

Metadata Downloads
Abstract
Recently, deep neural networks (DNNs) were successfully introduced to the speech enhancement area. Conventional DNN-based algorithms generally produce over-smoothed output features which deteriorate the quality of the enhanced speech. In addition, their performance measures calculated in the linear frequency scale do not match the human auditory perception where the sensitivity follows the Mel frequency scale. In this paper, we propose a novel objective function for DNN-based speech enhancement algorithm. In the proposed technique, a new objective function which consists of the Mel-scale weighted mean square error, and temporal and spectral variations similarities between the enhanced and clean speech is employed in the DNN training stage. The proposed objective function helps to compute the gradients based on a perceptually motivated non-linear frequency scale and alleviates the over smoothness of the estimated speech. In the experiments, the performance of the proposed algorithm was compared to the conventional DNN-based speech enhancement algorithm in matched and mismatched noise conditions. From the experimental results, we can see that the proposed algorithm performs better than the conventional algorithm in terms of both the objective and subjective measures. (C) 2017 Elsevier Inc. All rights reserved.
Author(s)
Kang, Tae GyoonShin, Jong WonKim, Nam Soo
Issued Date
2018-03
Type
Article
DOI
10.1016/j.dsp.2017.12.002
URI
https://scholar.gist.ac.kr/handle/local/13382
Publisher
Academic Press
Citation
Digital Signal Processing: A Review Journal, v.74, pp.102 - 110
ISSN
1051-2004
Appears in Collections:
Department of Electrical Engineering and Computer Science > 1. Journal Articles
공개 및 라이선스
  • 공개 구분공개
파일 목록
  • 관련 파일이 존재하지 않습니다.

Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.