Design and Control of a Cartpole using Deep Reinforcement Learning
- Author(s)
- Usman Imran
- Type
- Thesis
- Degree
- Master
- Department
- 대학원 융합기술학제학부(지능로봇프로그램)
- Advisor
- Ryu, Jeha
- Abstract
- The classic control requires to know the system equations, models and knowledge related to the tasks. As with the development and increasing complexity of the Robotic systems, it would be helpful and beneficial if the agent could autonomously learn the control policies. In this study the classic cartpole problem is controlled using Reinforcement learning combined with Deep Neural network.
This work first examines the control of a cartpole using deep reinforcement learning in a simulation environment. For the simulation, an environment is created using Open AI gym and controlled through the DQN algorithm. Then different values of hyperparameters consisting of gamma, epsilon, lambda and learning rate are evaluated in terms of how they affect the learning process and optimal value for these hyperparameters are determined. Then the effect of reward function in deep reinforcement learning is examined and the study is evaluated for the case in which only positive reward is given and then for the case when negative reward along with positive reward is given which resulted in decreasing the training time and episodes. In next stage, the deep Reinforcement Learning algorithm is tested on a physical cartpole system. In case of physical cartpole system, the tuned values of hyperparameters determined from simulation are used along with the negative (inverse) reward function. The main contribution of this work is introducing the negative reward for certain actions which would lead to the termination resulting in pole falling down. So, by introducing negative reward function along with positive reward resulted in reduction of time and episodes for the training. The study shows that Deep Reinforcement Learning proves to be helpful in learning the cartpole balancing task without knowing the dynamics and models of the system by selecting optimal values of parameters along with proper reward function resulting in reducing training time and episodes.
- URI
- https://scholar.gist.ac.kr/handle/local/32847
- Fulltext
- http://gist.dcollection.net/common/orgView/200000908515
- 공개 및 라이선스
-
- 파일 목록
-
Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.