OAK

GIST Library Login

Metadata Downloads

Abstract: Natural Language Temporal Localization (NLTL) is a task that aims to localize a temporal segment from a video that is specified by a natural language query. The task is actively studied in recent literature, but it is still considered far from practical due to the immense cost of annotating videos with language.
To alleviate the annotation costs, we propose a novel task of unsupervised natural language temporal localization (Unsupervised NLTL), which aims to train an NLTL model only with random text corpus and unlabeled video collections.
To suggest an example approach for the task, we propose PUNTeL (Pseudo-labeling approach for Unsupervised Natural language Temporal Localization) which is a framework that generates pseudo-labels for NLTL task and effectively trains NLTL model with the pseudo-labels.
Experimental results show that our unsupervised framework even outperforms several weakly-supervised methods in widely used benchmarks such as Charades-STA and ActivityNet-Captions.

Appears in Collections:: Department of Electrical Engineering and Computer Science > 3. Theses(Master)

공개 및 라이선스

qrcode

OAK GIST Scholar는 국립중앙도서관 OAK Repository 보급사업으로 구축되었습니다.