Method of Learning an Accurate State Transition Dynamics Model by Fitting both a Function and Its Derivative Simultaneously
- Abstract
- Accurate state transition dynamics model is essential component for model-based controller such as model-based reinforcement learning (MBRL), model predictive control (MPC). Through an accurate state transition dynamics model, simulation can be performed without real interaction to determine important business decision predicting future states. For example, in a smart factory, the entire manufacturing process can be simulated realistically to design the process sequence and processing time at low cost without time and space constraints. However, if the model is not accurate, it can lead to wrong decisions.
In order to obtain an accurate model, analytic modeling is traditionally used, but the analytic model is difficult to model because complex non-commercial robots have very complex dynamic models. Recently, learning a state transition dynamics model with collected data such as control input and its trajectory has been widely used. However, it is difficult to learn an accurate state transition dynamics model in the real world. For example, if a robot moves randomly in a real environment, it may reach singularities or joint limits causing unexpected behavior such as failure, wear, and collisions. In this case, efficient data acquisition method and data-efficient learning method are necessary.
In the field of function approximation, a derivative learning method that uses function values and their derivatives simultaneously has been studied to improve the accuracy of the model, however, it has never been applied to the field of state transition dynamics model learning. The previous state transition dynamics model learning method predicts the next states given the current states and actions showing poor prediction accuracy with small amount of dataset. Therefore, this thesis proposes a novel MBRL method with derivative information such as velocity and acceleration as ground-truth derivatives to improve the sample-efficiency and prediction accuracy. Then, the proposed method reduced the prediction error of the existing method about 85% in the virtual environment and about 34% in the real environment. Moreover, the more accurate state transition dynamics model actually shows better performance in the goal reaching task experiment with obstacle avoidance or manipulability maximization.
Finally, a type of activation function plays an important role in the derivative learning process in terms of accuracy and convergence. However, the effects of various activation functions on the derivative learning have never been investigated. Therefore, the good derivative characteristics of activation function are analyzed and several experiments are conducted to compare performance according to the activation function. In these experiment, swish activation function shows the best performance.
- Author(s)
- Youngho Kim
- Issued Date
- 2023
- Type
- Thesis
- URI
- https://scholar.gist.ac.kr/handle/local/19488
- 공개 및 라이선스
-
- 파일 목록
-
Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.