Affiliation:
1. Suzhou Joint Graduate School, Southeast University, Suzhou, Jiangsu Province 215123, P. R. China
2. School of Mathematics, Southeast University, Nanjing, Jiangsu Province 210096, P. R. China
Abstract
Reinforcement learning has been proven to be an effective approach for solving multi-agent coordination problems in a dynamic open environment. For dealing with multi-agent cooperation issues, the mean field multi-agent reinforcement learning method can better overcome the problems of slow learning speed, unstable convergent performance, and poor learning effect. However, the original mean field algorithm cannot extract features well when agents cooperate. In order to solve the large-scale multi-agent coordination problem, in this paper, the mean field multi-agent reinforcement learning algorithm is improved and optimized by combining the multi-head attention mechanism, and the attention-based mean field (MFA) structure is designed. The employment of a multi-head attention mechanism can optimize the interaction among agents, extract more effective cluster features and enable agents to learn more efficient strategies. This paper first introduces the framework structure of MFA and then expounds on the relevant theoretical basis based on the Q-Learning and Actor-Critic algorithms, and finally conducts large-scale multi-agent cooperative experiments on the MAgent platform. The experimental results show that compared with the baseline algorithm, the attention-based mean field Q-learning (MFQA) and attention-based Actor-Critic (MFACA) algorithms can make large-scale multi-agent clusters converge to higher rewards, and perform better than the original mean field multi-agent algorithm.
Funder
National Key R&D Program of China
National Natural Science Foundation of China
Jiangsu Provincial Key Laboratory of Networked Collective Intelligence
Publisher
World Scientific Pub Co Pte Ltd