OAK

GIST Library Login

GIST Scholar College of Information and Computing Department of Electrical Engineering and Computer Science 3. Theses(Master)

Visual Attention Network with 1D Large Kernel for Speaker Verification

Metadata Downloads

Author(s): YECHAN YU

Type: Thesis

Degree: Master

Department: 대학원 전기전자컴퓨터공학부

Advisor: Kim, Hong Kook

Abstract: 본 논문은 화자 검증을 위한 1차원 넓은 커널을 갖는 비주얼 주의집중 네트워크 MFAVAN
을 제안한다. 우리는 1차원 넓은 커널을 효율적으로 사용하기 위해서, 이를 3가지의
컴볼루션 모듈로 분해하여 사용한다. 1차원 넓은 커널은 지역성과 전역성 특징을 동시의
추출할 뿐 아니라 채널 별 적응성을 보장하고 계산 복잡도 측면에서 개선을 보였다.
결과적으로, 우리의 제안된 모델은 VoxCeleb1-O 평가 데이터의 대해서 트랜스포터
기반 모델인 MFA-Transformer 보다 0.49% 향상된 EER 성능을 보였다. 또한 3 종류
의 VoxCeleb1 평가 데이터에 대해서 주의 집중 모델중 최첨단 모델인 MFA-Conformer
보다 절반 정도의 모델 파라미터로 거의 동일한 성능을 보였다.|In this paper, we propose a MFA-VAN which convert self-attention to Visual Attention
Network (VAN) with 1D Large Kernel Attention (LKA) for Speaker Verification.
The proposed model effectively utilizes 1D Large Kernel Attention, which decompose
three type of convolution, i.e., depth-wise Convolution, Depth-wise Dilated Convolution,
Point-wise Convolution. Large Kernel Attention not only extract local and global
feature simultaneously, it but also has a channel adaptability and better computational
complexity better than self-attention.
As a result, our proposed model show a improvement of performance better by
EER of 0.49% than MFA-transformer which transformer-based model in VoxCeleb1-
O evaluation set. In addition, Evaluation results for 3 types of evaluation set and a
variance on frame lengths show similar performance with about half the parameters of
the existing state-of-the-art model, MFA-Conformer.

URI: https://scholar.gist.ac.kr/handle/local/19887

Fulltext: http://gist.dcollection.net/common/orgView/200000883614

Alternative Author(s): 유예찬

Appears in Collections:: Department of Electrical Engineering and Computer Science > 3. Theses(Master)

메타데이터 간략히 보기메타데이터 전체 보기

공개 및 라이선스

공개 구분공개

qrcode

트윗하기

OAK GIST Scholar는 국립중앙도서관 OAK Repository 보급사업으로 구축되었습니다.