OAK

GIST Library Login

GIST Scholar College of Information and Computing Department of AI Convergence 3. Theses(Master)

Design and Control of a Cartpole using Deep Reinforcement Learning

Metadata Downloads

Author(s): Usman Imran

Type: Thesis

Degree: Master

Department: 대학원 융합기술학제학부(지능로봇프로그램)

Advisor: Ryu, Jeha

Abstract: The classic control requires to know the system equations, models and knowledge related to the tasks. As with the development and increasing complexity of the Robotic systems, it would be helpful and beneficial if the agent could autonomously learn the control policies. In this study the classic cartpole problem is controlled using Reinforcement learning combined with Deep Neural network.
This work first examines the control of a cartpole using deep reinforcement learning in a simulation environment. For the simulation, an environment is created using Open AI gym and controlled through the DQN algorithm. Then different values of hyperparameters consisting of gamma, epsilon, lambda and learning rate are evaluated in terms of how they affect the learning process and optimal value for these hyperparameters are determined. Then the effect of reward function in deep reinforcement learning is examined and the study is evaluated for the case in which only positive reward is given and then for the case when negative reward along with positive reward is given which resulted in decreasing the training time and episodes. In next stage, the deep Reinforcement Learning algorithm is tested on a physical cartpole system. In case of physical cartpole system, the tuned values of hyperparameters determined from simulation are used along with the negative (inverse) reward function. The main contribution of this work is introducing the negative reward for certain actions which would lead to the termination resulting in pole falling down. So, by introducing negative reward function along with positive reward resulted in reduction of time and episodes for the training. The study shows that Deep Reinforcement Learning proves to be helpful in learning the cartpole balancing task without knowing the dynamics and models of the system by selecting optimal values of parameters along with proper reward function resulting in reducing training time and episodes.

URI: https://scholar.gist.ac.kr/handle/local/32847

Fulltext: http://gist.dcollection.net/common/orgView/200000908515

Appears in Collections:: Department of AI Convergence > 3. Theses(Master)

메타데이터 간략히 보기메타데이터 전체 보기

공개 및 라이선스

공개 구분공개

qrcode

트윗하기

OAK GIST Scholar는 국립중앙도서관 OAK Repository 보급사업으로 구축되었습니다.