OAK

Cost-efficient Active Learning for Referring Image Segmentation and Grounding

Metadata Downloads
Author(s)
Junbeom Hong
Type
Thesis
Degree
Master
Department
정보컴퓨팅대학 AI융합학과
Advisor
Kim, Sundong
Abstract
Visual grounding (VG) suffers from prohibitive annotation costs because it requires not only precise region labels (i.e., masks or boxes) but also detailed descriptions of that region. We tackle this annotation bottleneck by formulating active learning (AL) for VG under the realistic setting where the unlabeled pool consists of only raw im- ages without accompanying text. However, estimating sample informativeness without ground-truth text remains challenging, as the model must still assess how well each image disambiguates the referred region from visually similar distractors. To address this, we generate auxiliary region-text pairs using foundation models, and introduce Text-Grounded Region Entropy, a new acquisition function that measures whether the model’s confidence collapses onto a single region or disperses across multiple candi- dates. It allows our method to prioritize images with strong cross-region competition, i.e., visually ambiguous yet highly informative ones. We further design a cost-efficient annotation interface that reduces the labor-intensive labeling of both masks and expres- sions with just a few clicks. In experiments, our AL framework consistently outperforms several AL baselines on RIS and REC benchmarks, while achieving up to 6× and 1.4× faster for mask and text labeling efficiency in the user study.
URI
https://scholar.gist.ac.kr/handle/local/33703
Fulltext
http://gist.dcollection.net/common/orgView/200000944972
Alternative Author(s)
홍준범
Appears in Collections:
Department of AI Convergence > 3. Theses(Master)
공개 및 라이선스
  • 공개 구분공개
파일 목록
  • 관련 파일이 존재하지 않습니다.

Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.