Traffic Accident Explanation Via Large Vision and Language Model
- Abstract
- Traffic accidents are a growing global concern, showing an increasing trend year by
year. In this context, determining whether a vehicle is currently at risk of an accident has
become crucial. To address this issue, the field of traffic accident anticipation has emerged.
Traffic accident anticipation has traditionally focused on extracting spatiotemporal
information through vision to solve this problem. However, recent studies have suggested
that the inclusion of linguistic cues alongside visual information can effectively enhance the
recognition of accident situations. Moreover, recent research has revealed that Large Vision
and Language Models (LVLMs) also demonstrate impressive results in video understanding. In
light of this, our paper proposes a model that fine-tunes an LVLM to comprehend accident
footage and bases accident prediction on this understanding.
In this paper, we have constructed an accident video-instruction dataset utilizing
datasets that describe the causes of accidents. We have fine-tuned the LVLM using this
dataset to enable the model to generate descriptions of accident footage. Subsequently, we
used the output values to calculate an accident occurrence probability score. The results
confirm that our model accurately describes accident footage and effectively infers the
probability of accidents
- Author(s)
- Taehyung Gil
- Issued Date
- 2024
- Type
- Thesis
- URI
- https://scholar.gist.ac.kr/handle/local/19851
- 공개 및 라이선스
-
- 파일 목록
-
Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.