OAK

GIST Library Login

Metadata Downloads

Abstract: Recently, an end-to-end speaker verication systems using deep neural networks have been widely studied. In speaker verication, However, many studies have been performed using hand-crafted features such as mel-lterbank energies, mel-frequency, cepstral coefficients and spectrogram. Although these features have more interpretable information than raw waveform, it is not optimal for deep learning models to use them. In this paper, inspired by the success of method learning end-to-end time-domain in source separation, we propose a speaker feature extraction model for time domain speaker verication that uses temporal convolutioanl network (TCN) consisting of stacked 1-D dilated convolutional blocks combined with squeeze and excitation (SE) block. Our experiments on the Voxceleb1 dataset demonstrate that the proposed feature extraction model shows competitive performance compared to networks using raw waveform.

Appears in Collections:: Department of Electrical Engineering and Computer Science > 3. Theses(Master)

공개 및 라이선스

qrcode

OAK GIST Scholar는 국립중앙도서관 OAK Repository 보급사업으로 구축되었습니다.