Unsupervised Natural Language Video Localization
- Author(s)
- Jinwoo nam
- Type
- Thesis
- Degree
- Master
- Department
- 대학원 전기전자컴퓨터공학부
- Advisor
- Choi, Jonghyun
- Abstract
- Natural Language Temporal Localization (NLTL) is a task that aims to localize a temporal segment from a video that is specified by a natural language query. The task is actively studied in recent literature, but it is still considered far from practical due to the immense cost of annotating videos with language.
To alleviate the annotation costs, we propose a novel task of unsupervised natural language temporal localization (Unsupervised NLTL), which aims to train an NLTL model only with random text corpus and unlabeled video collections.
To suggest an example approach for the task, we propose PUNTeL (Pseudo-labeling approach for Unsupervised Natural language Temporal Localization) which is a framework that generates pseudo-labels for NLTL task and effectively trains NLTL model with the pseudo-labels.
Experimental results show that our unsupervised framework even outperforms several weakly-supervised methods in widely used benchmarks such as Charades-STA and ActivityNet-Captions.
- URI
- https://scholar.gist.ac.kr/handle/local/33281
- Fulltext
- http://gist.dcollection.net/common/orgView/200000907423
- Authorize & License
-
- Files in This Item:
-
Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.