A Study on Evolutionary Reinforcement Learning: From Single to Multi-Objective
- Abstract
- For the last few years, the field of artificial intelligence (AI) has witnessed a series of dramatic improvements thanks to the application of diverse technologies. The neural network-applied supervised and unsupervised learning techniques have proven themselves to be highly successful in procuring noteworthy achievements in multiple academic domains including, but not limited to, computer vision, natural language processing, physics, and bioengineering. The amount of data they require, however, is commensurate with the complexity of problems to solve, and, as a result, the sheer size of data itself to secure has become one of the most serious concerns in the recent AI researches. In contrast, Deep Reinforcement Learning (DRL), unlike other machine learning methods that train with prerequisite data, allows the agents to interact with an environment, creating on their own a new set of data to train with through a process of trial and error. Big data is hence unnecessary, leading to DRL’s high expandability that marks it as one of the core AI technologies in the future. Especially, the capability to detect states within an environment and to learn actions to maximize the rewards makes DRL exceptionally dominant in the fields of intelligent control and simulation-based optimization. Nevertheless, the transition of researches on DRL from non-real to real world environments has generated numerous obstacles to practicing its application. The most notable among them is learning instability, caused by continuously accumulated data, lengthy training time, and the amount of computation from the complex state space and the large number of the agent’s actions.
This thesis introduces a series of DRL-oriented techniques that, combined with evolutionary computation, can resolve the aforementioned problems. Their applications are divided into two sections: single-objective and multi-objective optimizations. In solving single-objective optimization problems, the techniques of Evolutionary Reinforcement Learning (EvoRL) that are applicable to both model-based and model-free RLs, the two main methodologies, are introduced.
Model-based reinforcement learning consists of the following techniques: Evolutionary Monte-Carlo Tree Search (MCTS), which exploits a genetic algorithm to selectively choose the agent actions by discarding unnecessary ones; Optimized Action Sequence that improves the RL’s proximal approach with the evolutionary feature to search for the optimal solutions; proxy model-based MCTS that reduces the cost of interaction with the RL environment by designing a virtual environment model; Genetic Algorithm that optimizes the RL parameters to improve learning stability and performance; and the method of Evolving Population that simultaneously optimizes both available actions and hyperparameters to increment learning stability and the RL’s proximal approach.
Model-free reinforcement learning consists of the following techniques; the model-free method of Evolving Population that improves learning stability and the RL’s proximal approach, and Genetic State-Grouping that reduces training time and achieves performance approximately equal to that of the model-based methods by grouping similar states that the agent can recognize together. Lastly, the non-dominated, policy-induced reinforcement learning technique is introduced, which is used to amplify the diversity of optimal solutions in multi-objective optimization deep reinforcement learning, exploited in the field of intelligent control.
- Author(s)
- Man-Je Kim
- Issued Date
- 2023
- Type
- Thesis
- URI
- https://scholar.gist.ac.kr/handle/local/18920
- 공개 및 라이선스
-
- 파일 목록
-
Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.