OAK

Acoustic Event Detection Based on Fast Regional Convolutional Neural Networks

Metadata Downloads
Author(s)
Inyoung Park
Type
Thesis
Degree
Master
Department
대학원 전기전자컴퓨터공학부
Advisor
Kim, Hong Kook
Abstract
In real environment, various incident events are accompanied by complex sound, and we accept inexplicable information in it. It may not be easy to extract only the necessary information from this information. Audio Event Detection (AED) technique is used in order to detect the actual occurrence of these sounds. It is possible to help make a priority in event. This paper deals with the improvement of detection accuracy by comparing and analyzing three constituent methods of SED model.
Specially, the proposed method improves the detection accuracy of the polyphonic audio, which is a persistent problem of the AED model. Due to the increasing need for polyphonic audio event detection of security issues, several studies have been proposed for sound event detection, including Gaussian mixture model (GMM) or deep neural network (DNN). Recently, convolutional neural network (CNN)-based classification has been popular in speech and image processing. It is possible to detect monophonic sound effectively, but for the overlapped range, it only detect one audio at a time or totally gone for other audio. Although the hardware has model remarkable growth in the field of deep running, it is almost impossible to train a model for a polyphonic case for all events in real life. Thus, this paper proposes an AED method for both monophonic and polyphonic based on Fast Regional-CNN (Fast R-CNN). When audio events overlap in time, polyphonic audio events are detected in each region.
Therefore the proposal of this paper is a method to extract the characteristics of both monophonic and polyphonic acoustic signals obtained from the microphone installed in the outdoor environment and to perform the Robust time series analysis on the environmental changes in order to solve the reduction in the detection accuracy due to the lowering of the above-Based model to improve detection accuracy.
For this purpose, we gathered 9 different audio events, such as scream, explosion, glass breaking, gun shots, baby crying, car crash, car horn, siren, and tire skidding which are possibly happen in real life. Since it was hard to collect a sufficient number of real-life audio events for AED model training, we augmented the collected data using a time-stretching method and by mixing the data with noise.
The performance comparison of detection model is done by distance environment and noise environment, and it is learned by the same training set and compared fairly to the same test set.
URI
https://scholar.gist.ac.kr/handle/local/32674
Fulltext
http://gist.dcollection.net/common/orgView/200000909944
공개 및 라이선스
  • 공개 구분공개
파일 목록
  • 관련 파일이 존재하지 않습니다.

Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.