OAK

Feature Extraction Using Temporal Convolutional Network for Time-domain Speaker Verification

Metadata Downloads
Author(s)
Sangwook Han
Type
Thesis
Degree
Master
Department
대학원 전기전자컴퓨터공학부
Advisor
Shin, Jong Won
Abstract
Recently, an end-to-end speaker verication systems using deep neural networks have been widely studied. In speaker verication, However, many studies have been performed using hand-crafted features such as mel-lterbank energies, mel-frequency, cepstral coefficients and spectrogram. Although these features have more interpretable information than raw waveform, it is not optimal for deep learning models to use them. In this paper, inspired by the success of method learning end-to-end time-domain in source separation, we propose a speaker feature extraction model for time domain speaker verication that uses temporal convolutioanl network (TCN) consisting of stacked 1-D dilated convolutional blocks combined with squeeze and excitation (SE) block. Our experiments on the Voxceleb1 dataset demonstrate that the proposed feature extraction model shows competitive performance compared to networks using raw waveform.
URI
https://scholar.gist.ac.kr/handle/local/33034
Fulltext
http://gist.dcollection.net/common/orgView/200000909052
Authorize & License
  • Authorize공개
Files in This Item:
  • There are no files associated with this item.

Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.