TAU-Net: Temporal Activation U-Net Shared With Nonnegative Matrix Factorization for Speech Enhancement in Unseen Noise Environments
- Author(s)
- Jeon, Kwang Myung; Lee, Geon Woo; Kim, Nam Kyun; Kim, Hong Kook
- Type
- Article
- Citation
- IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, v.29, pp.3400 - 3414
- Issued Date
- 2021-11
- Abstract
- In this paper, a novel speech enhancement method based on a hybrid machine-learning architecture consisting of U-Net and nonnegative matrix factorization (NMF) is proposed. The proposed method attempts to take advantage of both the accurate separation for known noise environments by U-Net and the adaptation to unseen noises by an NMF with an online dictionary learning technique. To merge the two different architectures, a modified U-Net with a temporal activation layer (TAU-Net) is jointly optimized with NMF models that represent universal speech and noise. The proposed method first estimates the temporal activations from the encoder of the proposed TAU-Net. Then, an NMF with online dictionary learning adjusts the initially given temporal activations to suppress their cross-activations due to unseen noises that are unknown in the training phase of TAU-Net. Finally, clean speech is obtained by adjusting temporal activations to the TAU-Net decoder. The effectiveness of the proposed TAU-Net-based speech enhancement method is evaluated in various unseen noise environments. Consequently, the proposed method achieves a substantial improvement with average signal-to-distortion ratios of 2.32 dB and 5.68 dB, which are higher than those of the baseline methods such asspeech enhancement generative adversarial network (SEGAN) and U-Net, respectively.
- Publisher
- IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
- ISSN
- 2329-9290
- DOI
- 10.1109/TASLP.2021.3067154
- URI
- https://scholar.gist.ac.kr/handle/local/11209
- 공개 및 라이선스
-
- 파일 목록
-
Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.