Noise-Robust Speaker Verification With Attenuated Speech Restoration and Consistency Training
- Abstract
- Even though the performance of speaker verification (SV) has been significantly improved with deep learning approaches, it may degrade severely in the presence of background noises. Simple approaches to relieve this issue would be multi-condition training (MCT) and adopting a speech enhancement (SE) module as a pre-processor. However, whether joint-trained with the SV module or not, the SE module may occasionally incur speech attenuation which leads to the partial loss of speaker information. To address this problem, in this paper, we propose a noise-robust SV system with the SE front-end incorporating a speech restoration module using lost information aggregation and consistency training. In the speech restoration module, the lost information obtained from the noisy and enhanced latent representations processed with different sizes of the receptive field is aggregated to produce restored speech features using a loss function penalizing speech attenuation. Moreover, to further improve the robustness to the background noises and unseen data, we adopt consistency training to make the speaker embeddings for the noisy speech similar to those for the clean speech obtained by a pre-trained SV model. Our experimental results demonstrated that the proposed system significantly improved the performance of the speaker verification for the VoxCeleb dataset mixed with environmental noises, and exhibited the generalization capability in the experiment on the CHiME-4 dataset.
- Author(s)
- Han, Sangwook; Ahn, Youngdo; Shin, Jong Won
- Issued Date
- 2025
- Type
- Article
- DOI
- 10.1109/TASLPRO.2025.3567758
- URI
- https://scholar.gist.ac.kr/handle/local/18798
- 공개 및 라이선스
-
- 파일 목록
-
Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.