Sound Event Detection Using Frequency Dynamic Convolution and Feature Fusion Network
- Abstract
- Sound Event Detection (SED) identifies types and timestamps of events within audio clips and finds application in diverse areas such as audio captioning, wildlife tracking, and equipment monitoring. These applications are crucial for extracting and analyzing significant information from audio recordings. This thesis introduces an innovative approach to enhance SED by combining requency dynamic convolution (FDY) with a large kernel attention (LKA) and feature fusion network (FFN). The incorporation of a LKA network into the FDY network enhances the ability to detect sound events with varying lengths over an extended range of channels. This integration represents a major enhancement in SED capabilities, leading to more accurate recognition and classification of sound events. Furthermore, this thesis introduces a FFN designed for efficient fusion of embeddings from different networks. Utilizing integration and correlation branches, this network combines embeddings from various networks. The correlation branch specifically improves information exchange between embeddings through multiplication operations. This method effectively integrates embeddings from transformer and convolutional neural network, leading to more precise SED. Applied to the DCASE 2023 Challenge Task 4A dataset, the model's performance was evaluated using the Polyphonic Sound Detection Score (PSDS). This evaluation measures the impact of various modules within the network and compares performance with the latest models. Experimental results show that on the DCASE 2023 Challenge Task 4A development evaluation dataset, proposed model achieved a performance improvement of 0.023 in PSDS 1 and 0.014 in PSDS 2 compared to a pre-trained audio neural network-FDY network.
- Author(s)
- Ji Won Kim
- Issued Date
- 2024
- Type
- Thesis
- URI
- https://scholar.gist.ac.kr/handle/local/19685
- 공개 및 라이선스
-
- 파일 목록
-
Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.