Enhancing Video Analysis of Car Accidents Using Multimodal Large Language Models with Effective Prompting Techniques
- Author(s)
- Inho Park
- Type
- Thesis
- Degree
- Master
- Department
- 대학원 기계공학부
- Advisor
- Lee, Yong-Gu
- Abstract
- In this study, we applied instruction tuning to LLMs (Large Language Models) to ensure that their outputs align with users' expectations. This approach assumes that AR (autoregressive) types of LLMs consider context when trained with large datasets. However, creating specialized datasets for instruction tuning, such as for accident video analysis, is challenging due to the difficulties in data collection and the high costs and time requirements for extensive processing. To address these challenges, this research introduces an innovative approach that utilizes the structural properties of prompts through the chain of prompts technique without extensive data training. Additionally, this study introduces the prompt structure called Diagnosticity to enhance the robustness of (Large Vision Language Model) LVLM models for video data, diverging from traditional prompt styles that focus mainly on images or basic tasks. The experiments in this paper avoid training on specific data by utilizing a zero-shot approach. For testing, the AccidentInsight(AI) Dataset, comprising 1,000 accident video clips with high-quality traffic accident-related summaries and six short questions, was used to evaluate models using only prompt techniques. This paper critically approaches the evaluation methods used for recent LVLMs, which primarily rely on LLM-based evaluation. Instead of solely using LLMs, we incorporate traditional methods like Character n-gram F1 Score (CHRF) and MoverScore to propose the H-Score, a new evaluation metric that balances the strengths and weaknesses of both n-gram and LLM evaluation methods. This comprehensive approach evaluates LVLM model performance across both paragraphs and short texts to provide a more accurate assessment.
- URI
- https://scholar.gist.ac.kr/handle/local/19253
- Fulltext
- http://gist.dcollection.net/common/orgView/200000878496
- 공개 및 라이선스
-
- 파일 목록
-
Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.