Multi-Agent Collaborative Target Search Based on the Multi-Agent Deep Deterministic Policy Gradient with Emotional Intrinsic Motivation-Reference-Cited by-同舟云学术

Multi-Agent Collaborative Target Search Based on the Multi-Agent Deep Deterministic Policy Gradient with Emotional Intrinsic Motivation

Published:2023-11-01 Issue:21 Volume:13 Page:11951
ISSN:2076-3417
Container-title:Applied Sciences
language:en
Short-container-title:Applied Sciences

Author:

Zhang Xiaoping¹²^ORCID,Zheng Yuanpeng¹,Wang Li¹,Abdulali Arsen²,Iida Fumiya²

Affiliation:

1. School of Electrical and Control Engineering, North China University of Technology, Beijing 100144, China

2. Department of Engineering, University of Cambridge, Cambridge CB2 1PZ, UK

Abstract

Multi-agent collaborative target search is one of the main challenges in the multi-agent field, and deep reinforcement learning (DRL) is a good way to learn such a task. However, DRL always faces the problem of sparse reward, which to some extent reduces its efficiency in task learning. Introducing intrinsic motivation has proved to be a useful way to make the sparse reward in DRL. So, based on the multi-agent deep deterministic policy gradient (MADDPG) structure, a new MADDPG algorithm with the emotional intrinsic motivation name MADDPG-E is proposed in this paper for the multi-agent collaborative target search. In MADDPG-E, a new emotional intrinsic motivation module with three emotions, joy, sadness, and fear, is designed. The three emotions are defined by corresponding psychological knowledge to the multi-agent embodied situations in an environment. An emotional steady-state variable function H is then designed to help judge the goodness of the emotions. Based on H, an emotion-based intrinsic reward function is finally proposed. With the designed emotional intrinsic motivation module, the multi-agent system always tries to make itself joy, which means it always learns to search the target. To show the effectiveness of the proposed MADDPG-E algorithm, two kinds of simulation experiments with a determined initial position and random initial position, respectively, are carried out, and comparisons are performed with MADDPG as well as MADDPG-ICM (MADDPG with an intrinsic curiosity module). The results show that with the designed emotional intrinsic motivation module, MADDPG-E has a higher learning speed and better learning stability, and the advantage is more obvious when facing complex situations.

Publisher

MDPI AG

Subject

Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science

Link

https://www.mdpi.com/2076-3417/13/21/11951/pdf

Reference43 articles.

1. Consensus in multi-agent systems: A review;Amirkhani;Artif. Intell. Rev.,2022

2. Survey of development and application of multi-agent technology;Li;Comput. Eng. Appl.,2018

3. An integrated localization and control framework for multi-agent formation;Cai;IEEE Trans. Signal Process.,2019

4. Han, W., Zhang, B., Wang, Q., Luo, J., Ran, W., and Xu, Y. (2019). A multi-agent based intelligent training system for unmanned surface vehicles. Appl. Sci., 9.

5. Multi-agent reinforcement learning for resource allocation in IoT networks with edge computing;Liu;China Commun.,2020