OAK

Enhancing Video Analysis of Car Accidents Using Multimodal Large Language Models with Effective Prompting Techniques

Metadata Downloads
Author(s)
Inho Park
Type
Thesis
Degree
Master
Department
대학원 기계공학부
Advisor
Lee, Yong-Gu
Abstract
In this study, we applied instruction tuning to LLMs (Large Language Models) to ensure that their outputs align with users' expectations. This approach assumes that AR (autoregressive) types of LLMs consider context when trained with large datasets. However, creating specialized datasets for instruction tuning, such as for accident video analysis, is challenging due to the difficulties in data collection and the high costs and time requirements for extensive processing. To address these challenges, this research introduces an innovative approach that utilizes the structural properties of prompts through the chain of prompts technique without extensive data training. Additionally, this study introduces the prompt structure called Diagnosticity to enhance the robustness of (Large Vision Language Model) LVLM models for video data, diverging from traditional prompt styles that focus mainly on images or basic tasks. The experiments in this paper avoid training on specific data by utilizing a zero-shot approach. For testing, the AccidentInsight(AI) Dataset, comprising 1,000 accident video clips with high-quality traffic accident-related summaries and six short questions, was used to evaluate models using only prompt techniques. This paper critically approaches the evaluation methods used for recent LVLMs, which primarily rely on LLM-based evaluation. Instead of solely using LLMs, we incorporate traditional methods like Character n-gram F1 Score (CHRF) and MoverScore to propose the H-Score, a new evaluation metric that balances the strengths and weaknesses of both n-gram and LLM evaluation methods. This comprehensive approach evaluates LVLM model performance across both paragraphs and short texts to provide a more accurate assessment.
URI
https://scholar.gist.ac.kr/handle/local/19253
Fulltext
http://gist.dcollection.net/common/orgView/200000878496
Alternative Author(s)
박인호
Appears in Collections:
Department of Mechanical and Robotics Engineering > 3. Theses(Master)
공개 및 라이선스
  • 공개 구분공개
파일 목록
  • 관련 파일이 존재하지 않습니다.

Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.