OAK

Generalized Depth Perception from Everyday Sensors

Metadata Downloads
Abstract
Accurate depth perception is one of the critical components for applications to au- tonomous driving, robotic navigation, and augmented reality. To obtain high-resolution, metric-scale depth information without relying on complex and expensive hardware, leveraging RGB images with corresponding sparse depth data from active sensors such as LiDAR and Kinect has become the most feasible solution, known as depth completion. This dissertation presents various approaches to enhance depth perception using commonly available sensors by addressing three fundamental challenges: (1) Better design of affinity map used for the depth completion, (2) Generalizable depth completion task motivated by prompt engineering, and (3) Data-efficient strategies to minimize the high costs associated with dense depth annotation. The first challenge arises from errors at object boundaries in conventional depth completion, where noise or smooth intensity changes in images cause ambiguity in the construction of pixel relationships. The conventional methods define an affinity map to explain the pixel relations in Euclidean space. They often struggle in such regions, leading to bleeding errors in depth perception, which occur when incorrect depth information spreads from one area to adjacent ones. To mitigate this, this dissertation redefines the representation space for the pixel relations from Euclidean to hyperbolic space, known for its effectiveness in capturing hierarchical relationships. Hyperbolic geometry allows us to make an affinity map with a more distinct separation between unrelated pixels by enlarging their distance, reducing the chance of incorrect depth information spreading between them. While the hyperbolic geometry-based affinity map significantly enhances pixel-level accuracy in depth completion, it is vital to address biases inherent in sensor measurements, as it can limit the effectiveness and applicability of dense depth perception in real-world scenarios. It is well-known that variations in sensor density, sensing patterns, and scan ranges lead to significant generalization issues. To overcome these limitations, this dissertation proposes a novel prompt engineering for depth input, enabling adaptable feature representations tailored to different depth distributions. By integrating this module into foundation models for monocular depth estimation, this dissertation allows these models to generate absolute scale depth maps without being constrained by specific sensor ranges, thereby enhancing their robustness and versatility. However, adapting these pretrained models remains challenging due to the significant differences between indoor and outdoor sensing environments. To further tackle the challenge of consistent depth estimation across diverse scenes and sensors, this dissertation defines a universal depth completion problem that acknowledges the significant data diversity between indoor and outdoor environments. This is crucial because variations in conditions, such as sudden snowfall, rain, or foggy situations, can drastically affect depth perception. To enable rapid adaptation, a baseline architecture is designed to estimate depth efficiently. It leverages a foundation model for monocular depth estimation to achieve a comprehensive understanding of 3D scene structures and incorporates a pixel-wise affinity map to align sensor-specific depth data with monocular depth estimates. By embedding features into hyperbolic space, this dissertation constructs implicit hierarchical structures of 3D data, thereby improving both adaptability and generalization, even in the face of limited examples.
Author(s)
박진휘
Issued Date
2025
Type
Thesis
URI
https://scholar.gist.ac.kr/handle/local/19325
Alternative Author(s)
Jin-Hwi Park
Department
대학원 AI대학원
Advisor
Jeon, Hae-Gon
Table Of Contents
Abstract i
List of Contents iii
List of Tables v
List of Figures viii
1 Introduction 1
1.1 Problem Definition 1
1.2 Scope of the Research 3
1.2.1 Hierarchical Affinity Learning with Hyperbolic Geometry 4
1.2.2 Depth Perception with Diverse Commercial Sensors 5
1.2.3 Universal Depth Completion with Minimal Resources 6
1.3 Outline of Dissertation 6
2 Hierarchical Affinity Learning with Hyperbolic Geometry 8
2.1 Introduction 8
2.2 Related Works 11
2.3 Mathematical Preliminaries 12
2.3.1 Background of Hyperbolic Geometry 13
2.3.2 Rationale: Hyperbolic Representation for Affinity 14
2.3.3 Pixel-level Hyperbolicity 17
2.4 Hyperbolic Convolution Layer 17
2.5 Hyperbolic Affinity Learning Module 20
2.6 Learning Affinity with Hyperbolic Representation 22
2.7 Experimental Results and Analysis 23
2.7.1 Depth Completion 23
2.7.2 Semantic Segmentation 27
2.7.3 Ablation Study 31
2.7.4 Analysis 36
2.8 Conclusion 37
– iii –
3 Depth Perception with Diverse Commercial Sensors 38
3.1 Introduction 38
3.2 Related Works 41
3.3 Sensor Biases in Depth Perception: Exploring Diverse Sensor Bias Prob-
lems 44
3.4 Depth Prompting: Foundation Model & Prompting Engineering Method 45
3.5 Experimental Results and Analysis 50
3.5.1 Experiment Setup 50
3.5.2 Experimental Results 56
3.5.3 Case Studies: Sparsity, Pattern and Range Biases 59
3.5.4 Ablation Study 62
3.5.5 Few/Zero-shot Inference on Various Sensors 64
3.6 Conclusion 69
4 Universal Depth Completion with Minimal Resources 71
4.1 Introduction 72
4.2 Related Works 74
4.3 Baseline Architecture: Universal Few-shot Depth Perception 77
4.3.1 Rationale: Foundation Model Usage in Universal Depth Completion 77
4.3.2 Architecture Design 78
4.4 Advanced Architecture with Hyperbolic Geometry 80
4.4.1 Multi-scale Feature Fusion & Hyperbolic Curvature Generation 80
4.4.2 Sparse-to-Dense Conversion based on Hyperbolic Features 82
4.4.3 Depth Refinement in Multi-curvature Hyperbolic Space 83
4.5 Experimental Results and Analysis 86
4.5.1 Implementation Details 86
4.5.2 Experiment 89
4.6 Conclusion 93
5 Concluding Remark 94
References 97
– iv –
Degree
Doctor
Appears in Collections:
Department of AI Convergence > 4. Theses(Ph.D)
공개 및 라이선스
  • 공개 구분공개
파일 목록
  • 관련 파일이 존재하지 않습니다.

Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.