OAK

GIST Library Login

GIST Scholar College of Information and Computing Department of AI Convergence 4. Theses(Ph.D)

Practical approaches to apply Human Pose Estimation for real-world applications

Metadata Downloads

Author(s): LEE SANGHYUB

Type: Thesis

Degree: Doctor

Department: 정보컴퓨팅대학 AI융합학과(문화기술프로그램)

Advisor: Hong, Jin-Hyuk

Abstract: Human Pose Estimation (HPE) has emerged as a vital area of research in computer vision, aiming to estimate human body configurations from input visual data. This dissertation addresses the challenges of HPE by proposing a set of practical methodologies focused on enhancing efficiency, robustness, and real-world applicability. The research is structured around three interconnected studies, each tackling a unique aspect of pose estimation, yet collectively contributing to the broader objective of developing scalable and high-performing HPE systems. The first study centers on multi-view 3D HPE, introducing a lightweight and markerless skeleton tracking algorithm that effectively resolves self-occlusion—a persistent challenge in pose estimation. This is achieved by merging pose candidates derived from multiple RGB-D sensors using a combination of DBSCAN clustering and Kalman filtering. By avoiding reliance on heavy deep learning models, the proposed algorithm is suitable for real- time applications in resource-constrained environments. Experimental evaluations confirm its superiority in tracking limb joints under occlusion, underscoring the importance of sensor fusion and spatial redundancy. However, this approach still requires a careful sensor installation process and may suffer from reduced accuracy due to the inherent limitations of depth information, such as susceptibility to environmental interference (e.g., sunlight or reflective surfaces), as well as the challenges posed by suboptimal sensor placement or simultaneous tracking of multiple individuals. Building upon the foundational insights from the first study, the second study transitions from controlled sensor environments to consumer-grade RGB videos, applying 3D human reconstruction (HR) techniques in the context of dance education. The resulting system, DanceSculpt (DS), demonstrates how HR can deliver multi-angle visualizations of human movement without the limitations of depth-based systems, such as IR interference and complex calibration. By leveraging 3D avatars reconstructed from monocular input, DS provides learners with detailed visual feedback, improving understanding of posture, timing, and spatial formation. The successful application of this HR method in dance learning emphasizes its potential for other motion-intensive educational contexts, creating a direct link to the goals of the first study while expanding its practical relevance. Nonetheless, DS’s reliance on a top-down approach results in increased inference time proportional to the number of detected individuals, making real-time performance a challenge. Furthermore, its inability to reconstruct poses accurately under severe occlusion or when most body parts are not visible remains a significant limitation. The third study builds upon the system-level insights from the first two investigations by introducing an advanced yet efficient 2D multi-person pose estimation framework that enhances performance within a one-stage architecture. Rather than generalizing the findings, this study focuses on achieving additional performance gains through a more effective integration of instance-centric attention mechanisms. InstaPose, developed in this stage, incorporates a novel Instance-Centric Keypoint Attention (ICKA) mechanism within a DETR-based transformer model. This design directly addresses a key limitation observed in earlier approaches—insufficient interaction between instance and keypoint queries—by enhancing contextual coherence and spatial precision. Extensive experiments on MS COCO and CrowdPose datasets validate the framework’s effectiveness, demonstrating its superiority in crowded scenes with minimal parameter overhead. The performance gains achieved here reflect lessons learned from both the robust merging strategies of the first study and the reconstruction-based feedback system of the second. However, InstaPose still inherits limitations of transformer-based architectures, including relatively high computational cost and complexity in training. Moreover, the framework lacks extension to 3D HPE, which limits its application in scenarios requiring full spatial understanding. Together, these three studies form a cohesive research trajectory that progressively abstracts from multi-sensor integration to high-level model architecture. This studies highlights how HPE solutions can be adapted across varying input modalities and use cases, from high-precision tracking systems to educational and real-time applications. This dissertation contributes to the ongoing evolution of HPE by showing that accurate, efficient, and scalable solutions are not mutually exclusive but can be simultaneously realized through thoughtful system design and cross-domain insight. The proposed approaches hold promise for a wide range of applications, including interactive learning, health monitoring, sports analysis, and beyond, where understanding and interpreting human motion is essential.

URI: https://scholar.gist.ac.kr/handle/local/31944

Fulltext: http://gist.dcollection.net/common/orgView/200000884269

Alternative Author(s): 이상협

Appears in Collections:: Department of AI Convergence > 4. Theses(Ph.D)

메타데이터 간략히 보기메타데이터 전체 보기

공개 및 라이선스

공개 구분공개

qrcode

트윗하기

OAK GIST Scholar는 국립중앙도서관 OAK Repository 보급사업으로 구축되었습니다.