OAK

Exploring Inductive Bias in Deep Neural Networks for Visual Perception

Metadata Downloads
Author(s)
Shikha Dubey
Type
Thesis
Degree
Doctor
Department
대학원 전기전자컴퓨터공학부
Advisor
Jeon, Moongu
Abstract
In the past decades, inductive biases have played a vital role in the success of deep learning algorithms. General artificial intelligence counteracts the inductive bias of an algorithm and tunes the algorithm for out-of-distribution generalization. A conspicuous impact of the inductive bias is an unceasing trend in improving deep learning performance. In this thesis, I explore indicative biases in deep neural networks and their effects on visual perception. Furthermore, I examine the effects of inductive biases on several visual perception applications including visual anomaly detection, image captioning, and object detection algorithms.
Achieving generalization in the task of anomalous activity detection is challenging due to given scarcity of annotated datasets. Moreover, learning the context-dependency of anomalous events as well as mitigating false alarms of the deep learning network requires well-defined inductive biases. Therefore, first, this thesis proposes a framework, Deep-network with Multiple Ranking Measures (DMRMs), which addresses context-dependency ``softly'' in the deep neural network using a joint learning technique for motion and appearance features. The proposed framework introduces multiple ranking measures (MRMs) as an inductive bias to the deep neural network to learn the context dependency in a weakly-supervised manner. Experimental results on two recent challenging datasets, UCF-Crime and ShanghaiTech, demonstrate that the proposed framework including inductive bias generalizes well in the anomaly detection task and helps in mitigating the false alarm rates.
I further explore the effect of inductive biases in the image captioning task. Automatic transcription is a nomenclature for describing meaningful information in an image using computer vision techniques. Automated image captioning techniques generally utilize the hard inductive bias of the encoder-decoder architectures. Consequently, I investigate the effect of utilizing both hard- and soft- inductive biases in the transformer architecture. For that, first I demonstrate the significance of using objects' relevance in the surrounding environment with the help of including soft-inductive bias. Whereas, object proposals are employed as a hard-inductive bias. Next, I included a soft-inductive bias in the network to learn an explicit association between vision and language constructs. This helps tolerate the variation in objects' class and its association with the dictionary's word called label in multi-label classification. Thus, in this thesis, I propose a label-attention transformer with geometrically coherent objects (LATGeO) for the image captioning task. The object coherence is defined using the localized ratio of the geometrical properties of the proposals. The soft-inductive bias, LAM associates the extracted objects classes to the available dictionary using self-attention layers. The experimentation results show that including both inductive biases in the proposed deep neural network help in defining meaningful captions. The proposed framework is tested on the MS COCO dataset, and a thorough evaluation resulting in overall better quantitative scores pronounces its superiority.
Although a quintessential attention-based object detection technique, DETR, shows better accuracy than its predecessors, its accuracy deteriorates for detecting small-sized (in-perspective) objects. Finally, I examine the inductive bias of DETR and propose a normalized inductive bias for object detection using data fusion, SOF-DETR. Application of SOF-DETR on the MS COCO and Udacity Self Driving Car datasets assert the effectiveness of the added normalized inductive bias and feature fusion techniques, showing increased COCO mAP scores on small-sized objects.
URI
https://scholar.gist.ac.kr/handle/local/19274
Fulltext
http://gist.dcollection.net/common/orgView/200000883099
공개 및 라이선스
  • 공개 구분공개
파일 목록
  • 관련 파일이 존재하지 않습니다.

Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.