OAK

GIST Library Login

GIST Repository College of Information and Computing Department of AI Convergence 3. Theses(Master)

Sound Event Detection Using Frequency Dynamic Convolution and Feature Fusion Network

Metadata Downloads

Author(s): Ji Won Kim

Type: Thesis

Degree: Master

Department: 대학원 AI대학원

Advisor: Kim, Hong Kook

Abstract: Sound Event Detection (SED) identifies types and timestamps of events within audio clips and finds application in diverse areas such as audio captioning, wildlife tracking, and equipment monitoring. These applications are crucial for extracting and analyzing significant information from audio recordings. This thesis introduces an innovative approach to enhance SED by combining requency dynamic convolution (FDY) with a large kernel attention (LKA) and feature fusion network (FFN). The incorporation of a LKA network into the FDY network enhances the ability to detect sound events with varying lengths over an extended range of channels. This integration represents a major enhancement in SED capabilities, leading to more accurate recognition and classification of sound events. Furthermore, this thesis introduces a FFN designed for efficient fusion of embeddings from different networks. Utilizing integration and correlation branches, this network combines embeddings from various networks. The correlation branch specifically improves information exchange between embeddings through multiplication operations. This method effectively integrates embeddings from transformer and convolutional neural network, leading to more precise SED. Applied to the DCASE 2023 Challenge Task 4A dataset, the model's performance was evaluated using the Polyphonic Sound Detection Score (PSDS). This evaluation measures the impact of various modules within the network and compares performance with the latest models. Experimental results show that on the DCASE 2023 Challenge Task 4A development evaluation dataset, proposed model achieved a performance improvement of 0.023 in PSDS 1 and 0.014 in PSDS 2 compared to a pre-trained audio neural network-FDY network.

URI: https://scholar.gist.ac.kr/handle/local/19685

Fulltext: http://gist.dcollection.net/common/orgView/200000880212

Alternative Author(s): 김지원

Appears in Collections:: Department of AI Convergence > 3. Theses(Master)

메타데이터 간략히 보기메타데이터 전체 보기

공개 및 라이선스

공개 구분공개

qrcode

트윗하기

OAK GIST Repository는 국립중앙도서관 OAK Repository 보급사업으로 구축되었습니다.