OAK

GIST Library Login

GIST Scholar College of Information and Computing Department of Electrical Engineering and Computer Science 4. Theses(Ph.D)

Exploring Inductive Bias in Deep Neural Networks for Visual Perception

Metadata Downloads

Author(s): Shikha Dubey

Type: Thesis

Degree: Doctor

Department: 대학원 전기전자컴퓨터공학부

Advisor: Jeon, Moongu

Abstract: In the past decades, inductive biases have played a vital role in the success of deep learning algorithms. General artificial intelligence counteracts the inductive bias of an algorithm and tunes the algorithm for out-of-distribution generalization. A conspicuous impact of the inductive bias is an unceasing trend in improving deep learning performance. In this thesis, I explore indicative biases in deep neural networks and their effects on visual perception. Furthermore, I examine the effects of inductive biases on several visual perception applications including visual anomaly detection, image captioning, and object detection algorithms.
Achieving generalization in the task of anomalous activity detection is challenging due to given scarcity of annotated datasets. Moreover, learning the context-dependency of anomalous events as well as mitigating false alarms of the deep learning network requires well-defined inductive biases. Therefore, first, this thesis proposes a framework, Deep-network with Multiple Ranking Measures (DMRMs), which addresses context-dependency ``softly'' in the deep neural network using a joint learning technique for motion and appearance features. The proposed framework introduces multiple ranking measures (MRMs) as an inductive bias to the deep neural network to learn the context dependency in a weakly-supervised manner. Experimental results on two recent challenging datasets, UCF-Crime and ShanghaiTech, demonstrate that the proposed framework including inductive bias generalizes well in the anomaly detection task and helps in mitigating the false alarm rates.
I further explore the effect of inductive biases in the image captioning task. Automatic transcription is a nomenclature for describing meaningful information in an image using computer vision techniques. Automated image captioning techniques generally utilize the hard inductive bias of the encoder-decoder architectures. Consequently, I investigate the effect of utilizing both hard- and soft- inductive biases in the transformer architecture. For that, first I demonstrate the significance of using objects' relevance in the surrounding environment with the help of including soft-inductive bias. Whereas, object proposals are employed as a hard-inductive bias. Next, I included a soft-inductive bias in the network to learn an explicit association between vision and language constructs. This helps tolerate the variation in objects' class and its association with the dictionary's word called label in multi-label classification. Thus, in this thesis, I propose a label-attention transformer with geometrically coherent objects (LATGeO) for the image captioning task. The object coherence is defined using the localized ratio of the geometrical properties of the proposals. The soft-inductive bias, LAM associates the extracted objects classes to the available dictionary using self-attention layers. The experimentation results show that including both inductive biases in the proposed deep neural network help in defining meaningful captions. The proposed framework is tested on the MS COCO dataset, and a thorough evaluation resulting in overall better quantitative scores pronounces its superiority.
Although a quintessential attention-based object detection technique, DETR, shows better accuracy than its predecessors, its accuracy deteriorates for detecting small-sized (in-perspective) objects. Finally, I examine the inductive bias of DETR and propose a normalized inductive bias for object detection using data fusion, SOF-DETR. Application of SOF-DETR on the MS COCO and Udacity Self Driving Car datasets assert the effectiveness of the added normalized inductive bias and feature fusion techniques, showing increased COCO mAP scores on small-sized objects.

URI: https://scholar.gist.ac.kr/handle/local/19274

Fulltext: http://gist.dcollection.net/common/orgView/200000883099

Alternative Author(s): 시카 두베

Appears in Collections:: Department of Electrical Engineering and Computer Science > 4. Theses(Ph.D)

메타데이터 간략히 보기메타데이터 전체 보기

공개 및 라이선스

공개 구분공개

qrcode

트윗하기

OAK GIST Scholar는 국립중앙도서관 OAK Repository 보급사업으로 구축되었습니다.