OAK

Unsupervised Natural Language Video Localization

Metadata Downloads
Author(s)
Jinwoo nam
Type
Thesis
Degree
Master
Department
대학원 전기전자컴퓨터공학부
Advisor
Choi, Jonghyun
Abstract
Natural Language Temporal Localization (NLTL) is a task that aims to localize a temporal segment from a video that is specified by a natural language query. The task is actively studied in recent literature, but it is still considered far from practical due to the immense cost of annotating videos with language.
To alleviate the annotation costs, we propose a novel task of unsupervised natural language temporal localization (Unsupervised NLTL), which aims to train an NLTL model only with random text corpus and unlabeled video collections.
To suggest an example approach for the task, we propose PUNTeL (Pseudo-labeling approach for Unsupervised Natural language Temporal Localization) which is a framework that generates pseudo-labels for NLTL task and effectively trains NLTL model with the pseudo-labels.
Experimental results show that our unsupervised framework even outperforms several weakly-supervised methods in widely used benchmarks such as Charades-STA and ActivityNet-Captions.
URI
https://scholar.gist.ac.kr/handle/local/33281
Fulltext
http://gist.dcollection.net/common/orgView/200000907423
Authorize & License
  • Authorize공개
Files in This Item:
  • There are no files associated with this item.

Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.