Multi-Agent Deep Reinforcement Learning Based Dynamic Task Offloading in a Device-to-Device Mobile-Edge Computing Network to Minimize Average Task Delay with Deadline Constraints-Reference-Cited by-同舟云学术

Multi-Agent Deep Reinforcement Learning Based Dynamic Task Offloading in a Device-to-Device Mobile-Edge Computing Network to Minimize Average Task Delay with Deadline Constraints

Published:2024-08-08 Issue:16 Volume:24 Page:5141
ISSN:1424-8220
Container-title:Sensors
language:en
Short-container-title:Sensors

Author:

He Huaiwen¹^ORCID,Yang Xiangdong¹²^ORCID,Mi Xin¹²,Shen Hong³,Liao Xuefeng⁴

Affiliation:

1. School of Computer, Zhongshan Institute, University of Electronic Science and Technology of China, Zhongshan 528400, China

2. Computer Science and Engineering School, University of Electronic Science and Technology of China, Chengdu 611731, China

3. Engineering and Technology, Central Queensland University, Rockhampton 4701, Australia

4. School of Data Science and Artificial Intelligence, Wenzhou University of Technology, Wenzhou 325027, China

Abstract

Device-to-device (D2D) is a pivotal technology in the next generation of communication, allowing for direct task offloading between mobile devices (MDs) to improve the efficient utilization of idle resources. This paper proposes a novel algorithm for dynamic task offloading between the active MDs and the idle MDs in a D2D–MEC (mobile edge computing) system by deploying multi-agent deep reinforcement learning (DRL) to minimize the long-term average delay of delay-sensitive tasks under deadline constraints. Our core innovation is a dynamic partitioning scheme for idle and active devices in the D2D–MEC system, accounting for stochastic task arrivals and multi-time-slot task execution, which has been insufficiently explored in the existing literature. We adopt a queue-based system to formulate a dynamic task offloading optimization problem. To address the challenges of large action space and the coupling of actions across time slots, we model the problem as a Markov decision process (MDP) and perform multi-agent DRL through multi-agent proximal policy optimization (MAPPO). We employ a centralized training with decentralized execution (CTDE) framework to enable each MD to make offloading decisions solely based on its local system state. Extensive simulations demonstrate the efficiency and fast convergence of our algorithm. In comparison to the existing sub-optimal results deploying single-agent DRL, our algorithm reduces the average task completion delay by 11.0% and the ratio of dropped tasks by 17.0%. Our proposed algorithm is particularly pertinent to sensor networks, where mobile devices equipped with sensors generate a substantial volume of data that requires timely processing to ensure quality of experience (QoE) and meet the service-level agreements (SLAs) of delay-sensitive applications.

Funder

Science and Technology Foundation of Guangdong Province

Publisher

MDPI AG

Link

https://www.mdpi.com/1424-8220/24/16/5141/pdf

Reference36 articles.

1. D2D-assisted multi-user cooperative partial offloading, transmission scheduling and computation allocating for MEC;Peng;IEEE Trans. Wirel. Commun.,2021

2. Energy-efficient mode selection and resource allocation for D2D-enabled heterogeneous networks: A deep reinforcement learning approach;Zhang;IEEE Trans. Wirel. Commun.,2020

3. Joint task offloading, D2D pairing, and resource allocation in device-enhanced MEC: A potential game approach;Fang;IEEE Internet Things J.,2021

4. Delay-limited computation offloading for MEC-assisted mobile blockchain networks;Zuo;IEEE Trans. Commun.,2021

5. Joint computing, communication and cost-aware task offloading in D2D-enabled Het-MEC;Abbas;Comput. Netw.,2022