Fusing RGB and depth with Self-attention for Unseen Object Segmentation
- Abstract
- We present a Synthetic RGB-D Fusion Mask R-CNN (SF Mask R-CNN) for unseen object instance segmentation. Our key idea is to fuse RGB and depth with a learnable spatial attention estimator, named Self-Attention-based Confidence map Estimator (SACE), in four scales upon a category-agnostic instance segmentation model. We pre-trained this SF Mask R-CNN on a large synthetic dataset and evaluated it on a public dataset, WISDOM, after fine-tuning on only a small number of real-world datasets. Our experiments showed the state-of-the-art performance of SACE in unseen object segmentation. Also, we compared the feature maps varying the input modality and fusion method and showed that SACE could be helpful to learn distinctive object-related features. The codes, dataset, and models are available at https://github.com/gist-ailab/SF-Mask-RCNN
- Author(s)
- Lee, Joosoon; Back, Seunghyeok; Kim, Taewon; Shin, Sungho; Noh, Sangjun; Kang, Raeyoung; Kim, Jongwon; Lee, Kyoobin
- Issued Date
- 2021-10-12
- Type
- Conference Paper
- DOI
- 10.23919/iccas52745.2021.9649991
- URI
- https://scholar.gist.ac.kr/handle/local/22028
- Publisher
- IEEE
- Citation
- 2021 21st International Conference on Control, Automation and Systems (ICCAS), pp.1599 - 1605
- ISSN
- 1598-7833
- Conference Place
- KO
Jeju, Korea, Republic of
-
Appears in Collections:
- Department of AI Convergence > 2. Conference Papers
- 공개 및 라이선스
-
- 파일 목록
-
Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.