Feature Extraction Using Temporal Convolutional Network for Time-domain Speaker Verification
- Author(s)
- Sangwook Han
- Type
- Thesis
- Degree
- Master
- Department
- 대학원 전기전자컴퓨터공학부
- Advisor
- Shin, Jong Won
- Abstract
- Recently, an end-to-end speaker verication systems using deep neural networks have been widely studied. In speaker verication, However, many studies have been performed using hand-crafted features such as mel-lterbank energies, mel-frequency, cepstral coefficients and spectrogram. Although these features have more interpretable information than raw waveform, it is not optimal for deep learning models to use them. In this paper, inspired by the success of method learning end-to-end time-domain in source separation, we propose a speaker feature extraction model for time domain speaker verication that uses temporal convolutioanl network (TCN) consisting of stacked 1-D dilated convolutional blocks combined with squeeze and excitation (SE) block. Our experiments on the Voxceleb1 dataset demonstrate that the proposed feature extraction model shows competitive performance compared to networks using raw waveform.
- URI
- https://scholar.gist.ac.kr/handle/local/33034
- Fulltext
- http://gist.dcollection.net/common/orgView/200000909052
- Authorize & License
-
- Files in This Item:
-
Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.