Improving esports viewing experience through hierarchical scene detection and tracking
- Abstract
- The role of an observer in esports is to provide spectators with the most engaging scenes in real time. To automate this process, various research has been conducted. In this study, we utilize Vision Transformer (ViT)-based object detection to enhance the accuracy of automatic observers. However, while ViT-based detection more accurately identifies engaging game scenes, it often leads to frequent and abrupt scene changes, reducing viewer comfort. To address this issue, we propose a novel hierarchical structure that combines scene detection with scene tracking, maintaining high accuracy while ensuring smoother transitions between scenes. This approach also improves inference speed, as the tracking model is faster than the detection model. We computationally evaluated six observer models in terms of accuracy and camera stability, with our method demonstrating significantly more stable camera control. Additionally, user testing indicated a strong preference for our model over those without tracking. A video comparing our method to the state-of-the-art can be viewed at https://youtu.be/gWiU4GACZEg. © The Author(s) 2025.
- Author(s)
- Joo, Ho-Taek; Lee, Sung-Ha; Chung, Insik; Kim, Kyung-Joong
- Issued Date
- 2025-03
- Type
- Article
- DOI
- 10.1038/s41598-025-93692-0
- URI
- https://scholar.gist.ac.kr/handle/local/8995
- 공개 및 라이선스
-
- 파일 목록
-
Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.