OAK

GIST Library Login

GIST Scholar College of Information and Computing Department of AI Convergence 3. Theses(Master)

Learning Strategies for Gaussian Process Regression Imitation Learning Using Stochastic Reward-Based Optimization

Metadata Downloads

Author(s): Jee Yong Park

Type: Thesis

Degree: Master

Department: 대학원 융합기술학제학부(지능로봇프로그램)

Advisor: Ryu, Jeha

Abstract: On-line motion planning in dynamically changing environments poses a significant challenge in the design of autonomous robotic systems. Conventional methods often require intricate design choices, while modern deep reinforcement learning (DRL) approaches demand vast amounts of robot motion data. These requirements become even more complex and extensive for systems with higher degrees of freedom (DOFs). Imitation learning addresses these issues by allowing human experts to intuitively provide demonstrations for robotic agents to learn from. Some learning frameworks utilize Gaussian process regression (GPR) models along with stochastic reward-based trajectory optimization algorithms to learn optimal policies with minimal number of demonstration data. These algorithms, however, may result in suboptimal policies such as abrupt changes in trajectories and require substantial amount of training data.
In this study, an experiment, detailed in an approach by Ewerton et al. [1], is scrutinized, where the authors’ method yields suboptimal policies. The causes of these suboptimalities are identified and mitigated by introducing problem-specific objective functions such as trajectory smoothness cost to the optimization algorithm. Additionally, a novel continuous-space data sampling method based on Voronoi tessellation is proposed to enhance data efficiency of the learning algorithm. This method leverages the off-line nature of the learning algorithm and the spatial correlation between input features and output of the trajectory prediction model to balance exploration and exploitation. Experimental results demonstrate that the proposed method can improve learned policies with fewer data compared to the previous baseline method.

URI: https://scholar.gist.ac.kr/handle/local/19451

Fulltext: http://gist.dcollection.net/common/orgView/200000884009

Alternative Author(s): 박지용

Appears in Collections:: Department of AI Convergence > 3. Theses(Master)

메타데이터 간략히 보기메타데이터 전체 보기

공개 및 라이선스

공개 구분공개

qrcode

트윗하기

OAK GIST Scholar는 국립중앙도서관 OAK Repository 보급사업으로 구축되었습니다.