Scalable Multi-Robot Task Allocation Using Graph Deep Reinforcement Learning with Graph Normalization-Reference-Cited by-同舟云学术

Scalable Multi-Robot Task Allocation Using Graph Deep Reinforcement Learning with Graph Normalization

Published:2024-04-19 Issue:8 Volume:13 Page:1561
ISSN:2079-9292
Container-title:Electronics
language:en
Short-container-title:Electronics

Author:

Zhang Zhenqiang¹^ORCID,Jiang Xiangyuan¹^ORCID,Yang Zhenfa²^ORCID,Ma Sile¹²,Chen Jiyang¹³,Sun Wenxu¹^ORCID

Affiliation:

1. Institute of Marine Science and Technology, Shandong University, Qingdao 266237, China

2. School of Control Science and Engineering, Shandong University, Jinan 250061, China

3. Shandong Zhengzhong Information Technology Co., Ltd., Jinan 250098, China

Abstract

Task allocation plays an important role in multi-robot systems regarding team efficiency. Conventional heuristic or meta-heuristic methods face difficulties in generating satisfactory solutions in a reasonable computational time, particularly for large-scale multi-robot task allocation problems. This paper proposes a novel graph deep-reinforcement-learning-based approach, which solves the problem through learning. The framework leverages the graph sample and aggregate concept as the encoder to extract the node features in the context of the graph, followed by a cross-attention decoder to output the probability that each task is allocated to each robot. A graph normalization technique is also proposed prior to the input, enabling an easy adaption to real-world applications, and a deterministic solution can be guaranteed. The most important advantage of this architecture is the scalability and quick feed-forward character; regardless of whether cases have a varying number of robots or tasks, single depots, multiple depots, or even mixed single and multiple depots, solutions can be output with little computational effort. The high efficiency and robustness of the proposed method are confirmed by extensive experiments in this paper, and various multi-robot task allocation scenarios demonstrate its advantage.

Funder

National Natural Science Foundation of China

Key Research and Development Program of Shandong Province

Project of Natural Science Foundation of Shandong Province

China University Innovation Fund

Qingdao Natural Science Foundation

Postdoctoral Innovation Project of Shandong Province

Qingdao Postdoctoral Funding Project

Publisher

MDPI AG

Link

https://www.mdpi.com/2079-9292/13/8/1561/pdf