OAK

GIST Library Login

검색

GIST Repository College of Information and Computing Department of Electrical Engineering and Computer Science 1. Journal Articles

Label-attention transformer with geometrically coherent objects for image captioning

Metadata Downloads

Author(s): Dubey, Shikha.; Olimov, F.; Rafique, M.A.; Kim, J.; Jeon, Moongu

Type: Article

Citation: Information Sciences, v.623, pp.812 - 831

Issued Date: 2023-04

Abstract: Encoder-decoder-based image captioning techniques are generally utilized to describe meaningful information present in an image. In this work, we investigate two unexplored ideas for image captioning using the transformer: 1) an object-focused label attention module (LAM), and 2) a geometrically coherent proposal (GCP) module that focuses on the scale and position of objects to benefit the transformer model by attaining better image perception. These modules demonstrate the enforcement of objects’ relevance in the surrounding environment. Furthermore, they explore the effectiveness of learning an explicit association between vision and language constructs. LAM and GCP tolerate the variation in objects’ class and its association with labels in multi-label classification. The proposed framework, label-attention transformer with geometrically coherent objects (LATGeO), acquires proposals of geometrically coherent objects using a deep neural network (DNN) and generates captions by investigating their relationships using LAM. The module LAM associates the extracted objects classes to the available dictionary using self-attention layers. Object coherence is acquired in the GCP module using the localized ratio of the proposals’ geometrical features. In this study, experimentation results are performed on MSCOCO dataset. The evaluation of LATGeO on MSCOCO advocates that objects’ relevance in surroundings and their visual features binding with geometrically localized ratios and associated labels generate improved and meaningful captions. © 2022

Publisher: Elsevier BV

ISSN: 0020-0255

DOI: 10.1016/j.ins.2022.12.018

URI: https://scholar.gist.ac.kr/handle/local/10276

Appears in Collections:: Department of Electrical Engineering and Computer Science > 1. Journal Articles

메타데이터 간략히 보기메타데이터 전체 보기

공개 및 라이선스

공개 구분공개

qrcode

트윗하기

OAK GIST Repository는 국립중앙도서관 OAK Repository 보급사업으로 구축되었습니다.