OAK

GIST Library Login

검색

Metadata Downloads

Abstract: Traffic accidents are a growing global concern, showing an increasing trend year by
year. In this context, determining whether a vehicle is currently at risk of an accident has
become crucial. To address this issue, the field of traffic accident anticipation has emerged.
Traffic accident anticipation has traditionally focused on extracting spatiotemporal
information through vision to solve this problem. However, recent studies have suggested
that the inclusion of linguistic cues alongside visual information can effectively enhance the
recognition of accident situations. Moreover, recent research has revealed that Large Vision
and Language Models (LVLMs) also demonstrate impressive results in video understanding. In
light of this, our paper proposes a model that fine-tunes an LVLM to comprehend accident
footage and bases accident prediction on this understanding.
In this paper, we have constructed an accident video-instruction dataset utilizing
datasets that describe the causes of accidents. We have fine-tuned the LVLM using this
dataset to enable the model to generate descriptions of accident footage. Subsequently, we
used the output values to calculate an accident occurrence probability score. The results
confirm that our model accurately describes accident footage and effectively infers the
probability of accidents

공개 및 라이선스

qrcode

OAK GIST Repository는 국립중앙도서관 OAK Repository 보급사업으로 구축되었습니다.