OAK

DenseBert4Ret: Deep bi-modal for image retrieval

Metadata Downloads
Abstract
In this study, we focus on the task of desired image retrieval from an extensive database. This comes under the purview of image retrieval systems. Users may input an image and a text as a query and expect the system to retrieve an image. The retrieved image should be close to the users' desires expressed in the query. Nowadays, digital media generates more than petabytes of imagery data in one day. Moreover, due to the internet, this massive amount of data is readily available to users. However, extracting the desired images from such a colossal databank is a challenging task. Users always prefer to extract an image that reflects their wishful thinking, and the user may desire to alter their visionary thoughts according to their abstract thoughts. For example, Elizabeth wants to have a laptop that is similar to her friend's laptop. However, she wants to have the same kind of laptop with a built-in GPU and in silver color. So she expects the e-business platform to show a laptop according to her wish. This paper attempts to devise a multi-modal algorithm for such tasks. It takes care of the user's visual and textual query. It has a query image and text as input and retrieves an image similar to the input image but modified according to the text query. This study focuses on a multi-modal image retrieval system that processes both image and text as input queries. Users can input an image and a text query to modify the image or add more information on it. The text reflects the desired modifications in the image. We proposed a bi-modal image retrieval system named DenseBert4Ret that learns image and text features concurrently. As the name indicates, DenseNet and BERT models are used for image and text features extractions, respectively. It is based on deep learning techniques used for the joint representation of image and text features. We trained the model, which forces the input image to be modified according to the user's textual query. We used deep information learning to train and test our model on three challenging real -world datasets, i.e., MIT States, Fshion200K and FashionIQ. We also show that the proposed model outperforms its predecessor with tuned parameters.(c) 2022 Elsevier Inc. All rights reserved.
Author(s)
Khan, ZafranLatif, BushraKim, JoonmoKim, Hong KookJeon, Moongu
Issued Date
2022-10
Type
Article
DOI
10.1016/j.ins.2022.08.119
URI
https://scholar.gist.ac.kr/handle/local/10587
Publisher
ELSEVIER SCIENCE INC
Citation
INFORMATION SCIENCES, v.612, pp.1171 - 1186
ISSN
0020-0255
Appears in Collections:
Department of Electrical Engineering and Computer Science > 1. Journal Articles
공개 및 라이선스
  • 공개 구분공개
파일 목록
  • 관련 파일이 존재하지 않습니다.

Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.