OAK

GIST Library Login

GIST Scholar College of Information and Computing Department of Electrical Engineering and Computer Science 3. Theses(Master)

U-Net Based Audio-Visual Anomaly Detection

Metadata Downloads

Author(s): IlHoon Song

Type: Thesis

Degree: Master

Department: 대학원 전기전자컴퓨터공학부

Advisor: Kim, Hong Kook

Abstract: 이상 상황 검출은 예측할 수 있는 행동을 따르지 않는 특정 사건을 식별하는 것을 의미한다. 이상 상황 검출은 감시 시스템에 활용되며 범죄 발생을 예측하여 무인감시체계의 단점을 보완한다. 이상 상황 검출 분야에서는 눈과 귀를 대표하는 음향 또는 영상 이상 상황 검출 기술이 널리 사용된다. 그러나, 이러한 단일모달(single-modal)의 정보만을 사용할 때 여러 한계점이 존재한다. 또한, 실생활에서 이상 상황 사건은 정의 내리기 모호한 경우가 많기 때문에 현재까지도 이상 상황 검출은 어려운 난제이다. 따라서, 본 논문은 U-Net 네트워크와 서포트 벡터 머신 (support vector machine, SVM) 분류기를 기반으로 한 음향-영상 이상 상황 검출 방법을 제안한다. 우선, 감시체계 분야에서 음향 또는 영상의 단일모달 정보가 지니고 있는 한계점을 보완하기 위해 시청각을 활용한 이상 상황 검출을 채택하였다. 이후, 실생활에서 정의하기 어려운 이상 상황을 식별하기 위해서 정상 사건만을 훈련시키고 추론 시, 훈련된 특징으로부터 벗어난 예측치에 대해서는 이상 상황으로 정의한다. 이를 위한 방법으로, 부차 샘플링 (sub-sampling) 부분에서 U-Net 네트워크 구조와 이상 상황 검출 부분에서는 SVM을 사용한다. 음향-영상 이상 상황 검출의 성능은 프레임 기반 성능을 위한 AUC와 사건 기반 성능을 위한 F1-score 측정 기준으로 평가하였다. 결과적으로, 음향 또는 영상의 단일모달 기반에서는 기존 이상 상황 검출 모델과 비교해볼 때 비슷한 성능을 보여주었다. 멀티모달 기반인 음향-영상 이상 상황 검출에서는 시뮬레이션 및 실제 데이터셋에서 F1-score 기준 90% 이상의 성능을 보여주는 것을 확인하였다.|Anomaly detection refers to the identification of events that do not conform to expected behavior. Anomaly detection is used in the surveillance system and complements the shortcomings of unmanned surveillance by predicting crime occurrence. Among the domain of anomaly detection, audio or video anomaly detection are widely used because they represent ears and eyes. However, they have some problems when used in an only single-modal information. Also, anomaly detection is extremely challenging because abnormal events are unbounded in real applications. In this paper, we propose a audio-visual anomaly detection method based on a U-Net and support vector machine (SVM). First, to resolve limitation of single-modal information in surveillance system, we adopted multi-modal based anomaly detection. Second, to identify unbounded abnormal events in real applications, training only normal events and anomaly is defined as those event do not conform the expectation from the trained features in the inference process. As a method for this, we use U-Net architecture for the sub-sampling process and SVM for the anomaly decision. The performance of the audio-visual anomaly detection is evaluated in terms of frame based or event based measures such as area under curve (AUC) and F1-score. In a single-modal based audio or video anomaly detection, the performance results were similar to those of the existing models. In the multi-modal based audio-visual anomaly detection, showed more than F1-score 90% performance on the simulation and real dataset.

URI: https://scholar.gist.ac.kr/handle/local/19873

Fulltext: http://gist.dcollection.net/common/orgView/200000884877

Alternative Author(s): 송일훈

Appears in Collections:: Department of Electrical Engineering and Computer Science > 3. Theses(Master)

메타데이터 간략히 보기메타데이터 전체 보기

공개 및 라이선스

공개 구분공개

qrcode

트윗하기

OAK GIST Scholar는 국립중앙도서관 OAK Repository 보급사업으로 구축되었습니다.