OAK

Traffic Accident Explanation Via Large Vision and Language Model

Metadata Downloads
Abstract
Traffic accidents are a growing global concern, showing an increasing trend year by
year. In this context, determining whether a vehicle is currently at risk of an accident has
become crucial. To address this issue, the field of traffic accident anticipation has emerged.
Traffic accident anticipation has traditionally focused on extracting spatiotemporal
information through vision to solve this problem. However, recent studies have suggested
that the inclusion of linguistic cues alongside visual information can effectively enhance the
recognition of accident situations. Moreover, recent research has revealed that Large Vision
and Language Models (LVLMs) also demonstrate impressive results in video understanding. In
light of this, our paper proposes a model that fine-tunes an LVLM to comprehend accident
footage and bases accident prediction on this understanding.
In this paper, we have constructed an accident video-instruction dataset utilizing
datasets that describe the causes of accidents. We have fine-tuned the LVLM using this
dataset to enable the model to generate descriptions of accident footage. Subsequently, we
used the output values to calculate an accident occurrence probability score. The results
confirm that our model accurately describes accident footage and effectively infers the
probability of accidents
Author(s)
Taehyung Gil
Issued Date
2024
Type
Thesis
URI
https://scholar.gist.ac.kr/handle/local/19851
Alternative Author(s)
길태형
Department
대학원 AI대학원
Advisor
Lee, Yong-Gu
Degree
Master
Appears in Collections:
Department of AI Convergence > 3. Theses(Master)
공개 및 라이선스
  • 공개 구분공개
파일 목록
  • 관련 파일이 존재하지 않습니다.

Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.