Abstract
One of the numerous multi-agents’ deep reinforcements learning methods and a hotspot for research in the field is multi-agent deep reinforcement learning based on value factorization. In order to effectively address the issues of environmental instability and the exponential expansion of action space in multi-agent systems, it uses some constraints to break down the joint action value function of the multi-agent system into a specific combination of individual action value functions. Firstly, in this paper, the reason for the factorization of value function is explained. The fundamentals of multi-agent deep reinforcement learning are then introduced. The multi-agent deep reinforcement learning algorithms based on value factorization may then be separated into simple factorization and attention-mechanism based algorithms depending on whether other mechanisms are incorporated and which various mechanisms are introduced. Then several typical algorithms are introduced and their advantages and disadvantages are compared and analyzed. Finally, the content of reinforcement learning elaborated in this paper is summarized.
Publisher
Darcy & Roy Press Co. Ltd.
Reference10 articles.
1. Sutton R S, Barto A G, Introduction to reinforcement learning. Cambridge: MIT press, 1998.
2. Nasir Y S, Guo D. Multi-Agent Deep Reinforcement Learning for Dynamic Power Allocation in Wireless Networks. IEEE Transactions on Wireless Communications, 2018, 26(99):2788-2799.
3. Sutton R S. Learning to predict by the methods of temporal differences. Machine Learning, 1988, 3(1):9-44.
4. Mnih V, Kavuk K, Silver D, et al. Human-level control through deep reinforcement learning. Nature, 2015, 518(7540):529-533.
5. Hasselt H, Guez A, Silver D. Deep reinforcement learning with double Q-learning, Proceedings of the AAAI Conference on Artificial Intelligence. 2016, 30(1):2094-2100.
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献