OAK

GIST Library Login

GIST Scholar College of Information and Computing Department of Electrical Engineering and Computer Science 3. Theses(Master)

Acoustic Event Detection Based on Fast Regional Convolutional Neural Networks

Metadata Downloads

Author(s): Inyoung Park

Type: Thesis

Degree: Master

Department: 대학원 전기전자컴퓨터공학부

Advisor: Kim, Hong Kook

Abstract: In real environment, various incident events are accompanied by complex sound, and we accept inexplicable information in it. It may not be easy to extract only the necessary information from this information. Audio Event Detection (AED) technique is used in order to detect the actual occurrence of these sounds. It is possible to help make a priority in event. This paper deals with the improvement of detection accuracy by comparing and analyzing three constituent methods of SED model.
Specially, the proposed method improves the detection accuracy of the polyphonic audio, which is a persistent problem of the AED model. Due to the increasing need for polyphonic audio event detection of security issues, several studies have been proposed for sound event detection, including Gaussian mixture model (GMM) or deep neural network (DNN). Recently, convolutional neural network (CNN)-based classification has been popular in speech and image processing. It is possible to detect monophonic sound effectively, but for the overlapped range, it only detect one audio at a time or totally gone for other audio. Although the hardware has model remarkable growth in the field of deep running, it is almost impossible to train a model for a polyphonic case for all events in real life. Thus, this paper proposes an AED method for both monophonic and polyphonic based on Fast Regional-CNN (Fast R-CNN). When audio events overlap in time, polyphonic audio events are detected in each region.
Therefore the proposal of this paper is a method to extract the characteristics of both monophonic and polyphonic acoustic signals obtained from the microphone installed in the outdoor environment and to perform the Robust time series analysis on the environmental changes in order to solve the reduction in the detection accuracy due to the lowering of the above-Based model to improve detection accuracy.
For this purpose, we gathered 9 different audio events, such as scream, explosion, glass breaking, gun shots, baby crying, car crash, car horn, siren, and tire skidding which are possibly happen in real life. Since it was hard to collect a sufficient number of real-life audio events for AED model training, we augmented the collected data using a time-stretching method and by mixing the data with noise.
The performance comparison of detection model is done by distance environment and noise environment, and it is learned by the same training set and compared fairly to the same test set.

URI: https://scholar.gist.ac.kr/handle/local/32674

Fulltext: http://gist.dcollection.net/common/orgView/200000909944

Appears in Collections:: Department of Electrical Engineering and Computer Science > 3. Theses(Master)

메타데이터 간략히 보기메타데이터 전체 보기

공개 및 라이선스

공개 구분공개

qrcode

트윗하기

OAK GIST Scholar는 국립중앙도서관 OAK Repository 보급사업으로 구축되었습니다.