OAK

GIST Library Login

Metadata Downloads

Abstract: The role of an observer in esports is to provide spectators with the most engaging scenes in real time. To automate this process, various research has been conducted. In this study, we utilize Vision Transformer (ViT)-based object detection to enhance the accuracy of automatic observers. However, while ViT-based detection more accurately identifies engaging game scenes, it often leads to frequent and abrupt scene changes, reducing viewer comfort. To address this issue, we propose a novel hierarchical structure that combines scene detection with scene tracking, maintaining high accuracy while ensuring smoother transitions between scenes. This approach also improves inference speed, as the tracking model is faster than the detection model. We computationally evaluated six observer models in terms of accuracy and camera stability, with our method demonstrating significantly more stable camera control. Additionally, user testing indicated a strong preference for our model over those without tracking. A video comparing our method to the state-of-the-art can be viewed at https://youtu.be/gWiU4GACZEg. © The Author(s) 2025.

공개 및 라이선스

qrcode

OAK GIST Repository는 국립중앙도서관 OAK Repository 보급사업으로 구축되었습니다.