OAK

Generalized Depth Perception from Everyday Sensors

Metadata Downloads
Author(s)
박진휘
Type
Thesis
Degree
Doctor
Department
대학원 AI대학원
Advisor
Jeon, Hae-Gon
Abstract
Accurate depth perception is one of the critical components for applications to au- tonomous driving, robotic navigation, and augmented reality. To obtain high-resolution, metric-scale depth information without relying on complex and expensive hardware, leveraging RGB images with corresponding sparse depth data from active sensors such as LiDAR and Kinect has become the most feasible solution, known as depth completion. This dissertation presents various approaches to enhance depth perception using commonly available sensors by addressing three fundamental challenges: (1) Better design of affinity map used for the depth completion, (2) Generalizable depth completion task motivated by prompt engineering, and (3) Data-efficient strategies to minimize the high costs associated with dense depth annotation. The first challenge arises from errors at object boundaries in conventional depth completion, where noise or smooth intensity changes in images cause ambiguity in the construction of pixel relationships. The conventional methods define an affinity map to explain the pixel relations in Euclidean space. They often struggle in such regions, leading to bleeding errors in depth perception, which occur when incorrect depth information spreads from one area to adjacent ones. To mitigate this, this dissertation redefines the representation space for the pixel relations from Euclidean to hyperbolic space, known for its effectiveness in capturing hierarchical relationships. Hyperbolic geometry allows us to make an affinity map with a more distinct separation between unrelated pixels by enlarging their distance, reducing the chance of incorrect depth information spreading between them. While the hyperbolic geometry-based affinity map significantly enhances pixel-level accuracy in depth completion, it is vital to address biases inherent in sensor measurements, as it can limit the effectiveness and applicability of dense depth perception in real-world scenarios. It is well-known that variations in sensor density, sensing patterns, and scan ranges lead to significant generalization issues. To overcome these limitations, this dissertation proposes a novel prompt engineering for depth input, enabling adaptable feature representations tailored to different depth distributions. By integrating this module into foundation models for monocular depth estimation, this dissertation allows these models to generate absolute scale depth maps without being constrained by specific sensor ranges, thereby enhancing their robustness and versatility. However, adapting these pretrained models remains challenging due to the significant differences between indoor and outdoor sensing environments. To further tackle the challenge of consistent depth estimation across diverse scenes and sensors, this dissertation defines a universal depth completion problem that acknowledges the significant data diversity between indoor and outdoor environments. This is crucial because variations in conditions, such as sudden snowfall, rain, or foggy situations, can drastically affect depth perception. To enable rapid adaptation, a baseline architecture is designed to estimate depth efficiently. It leverages a foundation model for monocular depth estimation to achieve a comprehensive understanding of 3D scene structures and incorporates a pixel-wise affinity map to align sensor-specific depth data with monocular depth estimates. By embedding features into hyperbolic space, this dissertation constructs implicit hierarchical structures of 3D data, thereby improving both adaptability and generalization, even in the face of limited examples.
URI
https://scholar.gist.ac.kr/handle/local/19325
Fulltext
http://gist.dcollection.net/common/orgView/200000841823
Alternative Author(s)
Jin-Hwi Park
Appears in Collections:
Department of AI Convergence > 4. Theses(Ph.D)
공개 및 라이선스
  • 공개 구분공개
파일 목록
  • 관련 파일이 존재하지 않습니다.

Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.