Abstract
This study examines various factors and conditions that are related with the performance of reinforcement learning, and defines a multi-agent DQN system (N-DQN) model to improve them. N-DQN model is implemented in this paper with examples of maze finding and ping-pong as examples of delayed reward system, where delayed reward occurs, which makes general DQN learning difficult to apply. The implemented N-DQN shows about 3.5 times higher learning performance compared to the Q-Learning algorithm in the reward-sparse environment in the performance evaluation, and compared to DQN, it shows about 1.1 times faster goal achievement speed. In addition, through the implementation of the prioritized experience replay and the implementation of the reward acquisition section segmentation policy, such a problem as positive-bias of the existing reinforcement learning models seldom or never occurred. However, according to the characteristics of the architecture that uses many numbers of actors in parallel, the need for additional research on light-weighting the system for further performance improvement has raised. This paper describes in detail the structure of the proposed multi-agent N_DQN architecture, the contents of various algorithms used, and the specification for its implementation.
Subject
Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science
Reference34 articles.
1. Reinforcement Learning: A Survey
2. Generalization in reinforcement learning: Safely approximating the value function;Boyan;Adv. Neural Inf. Process. Syst.,1995
3. Hierarchical deep reinforcement learning: Integrating temporal abstraction and intrinsic motivation;Kulkarni;Adv. Neural Inf. Inf. Process. Syst.,2016
4. Double Q-learning;Hasselt;Adv. Neural Inf. Process. Syst.,2010
5. Q-learning
Cited by
10 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献