OAK

Time-domain speaker verification using temporal convolutional networks

Metadata Downloads
Abstract
Recently, speaker verification systems using deep neural networks have been widely studied. Many of them utilize hand-crafted features such as mel-filterbank energies, mel-frequency cepstral coefficients, and magnitude spectrograms, which are not designed specifically for the speaker verification task and may not be optimal. Recent releases of the large datasets such as VoxCeleb enable us to extract the task-specific features in a data-driven way. In this paper, we propose a speaker verification system that takes the time-domain raw waveforms as inputs, which adopts a learnable encoder and temporal convolutional networks (TCNs) that have shown impressive performance in speech separation. Moreover, we have applied the squeeze and excitation networks after each TCN block to apply channel-wise attention. Our experiments on the VoxCeleb1 dataset demonstrate that the speaker verification system utilizing the proposed feature extraction model outperforms previously proposed time-domain speaker verification systems.
Author(s)
Han, SangwookByun, JaeukShin, Jong Won
Issued Date
2021-06-10
Type
Conference Paper
DOI
10.1109/ICASSP39728.2021.9414765
URI
https://scholar.gist.ac.kr/handle/local/22078
Publisher
Institute of Electrical and Electronics Engineers Inc.
Citation
2021 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2021, pp.6688 - 6692
ISSN
1520-6149
Conference Place
CN
Appears in Collections:
Department of Electrical Engineering and Computer Science > 2. Conference Papers
공개 및 라이선스
  • 공개 구분공개
파일 목록
  • 관련 파일이 존재하지 않습니다.

Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.