OAK

Label-attention transformer with geometrically coherent objects for image captioning

Metadata Downloads
Abstract
Encoder-decoder-based image captioning techniques are generally utilized to describe meaningful information present in an image. In this work, we investigate two unexplored ideas for image captioning using the transformer: 1) an object-focused label attention module (LAM), and 2) a geometrically coherent proposal (GCP) module that focuses on the scale and position of objects to benefit the transformer model by attaining better image perception. These modules demonstrate the enforcement of objects’ relevance in the surrounding environment. Furthermore, they explore the effectiveness of learning an explicit association between vision and language constructs. LAM and GCP tolerate the variation in objects’ class and its association with labels in multi-label classification. The proposed framework, label-attention transformer with geometrically coherent objects (LATGeO), acquires proposals of geometrically coherent objects using a deep neural network (DNN) and generates captions by investigating their relationships using LAM. The module LAM associates the extracted objects classes to the available dictionary using self-attention layers. Object coherence is acquired in the GCP module using the localized ratio of the proposals’ geometrical features. In this study, experimentation results are performed on MSCOCO dataset. The evaluation of LATGeO on MSCOCO advocates that objects’ relevance in surroundings and their visual features binding with geometrically localized ratios and associated labels generate improved and meaningful captions. © 2022
Author(s)
Dubey, Shikha.Olimov, F.Rafique, M.A.Kim, J.Jeon, Moongu
Issued Date
2023-04
Type
Article
DOI
10.1016/j.ins.2022.12.018
URI
https://scholar.gist.ac.kr/handle/local/10276
Publisher
Elsevier BV
Citation
Information Sciences, v.623, pp.812 - 831
ISSN
0020-0255
Appears in Collections:
Department of Electrical Engineering and Computer Science > 1. Journal Articles
공개 및 라이선스
  • 공개 구분공개
파일 목록

Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.