Optimization of Predefined-Time Agent-Scheduling Strategy Based on PPO-Reference-Cited by-同舟云学术

Optimization of Predefined-Time Agent-Scheduling Strategy Based on PPO

Published:2024-07-31 Issue:15 Volume:12 Page:2387
ISSN:2227-7390
Container-title:Mathematics
language:en
Short-container-title:Mathematics

Author:

Qi Dingding¹,Zhao Yingjun¹,Li Longyue¹^ORCID,Jia Zhanxiao²

Affiliation:

1. Air Defense and AntiMissile School, Air Force Engineering University, Xi’an 710043, China

2. Unmanned System Research Institute, Northwestern Polytechnical University, Xi’an 710072, China

Abstract

In this paper, we introduce an agent rescue scheduling approach grounded in proximal policy optimization, coupled with a singularity-free predefined-time control strategy. The primary objective of this methodology is to bolster the efficiency and precision of rescue missions. Firstly, we have designed an evaluation function closely related to the average flying distance of agents, which provides a quantitative benchmark for assessing different scheduling schemes and assists in optimizing the allocation of rescue resources. Secondly, we have developed a scheduling strategy optimization method using the Proximal Policy Optimization (PPO) algorithm. This method can automatically learn and adjust scheduling strategies to adapt to complex rescue environments and varying task demands. The evaluation function provides crucial feedback signals for the PPO algorithm, ensuring that the algorithm can precisely adjust the scheduling strategies to achieve optimal results. Thirdly, aiming to attain stability and precision in agent navigation to designated positions, we formulate a singularity-free predefined-time fuzzy adaptive tracking control strategy. This approach dynamically modulates control parameters in reaction to external disturbances and uncertainties, thus ensuring the precise arrival of agents at their destinations within the predefined time. Finally, to substantiate the validity of our proposed approach, we crafted a simulation environment in Python 3.7, engaging in a comparative analysis between the PPO and the other optimization method, Deep Q-network (DQN), utilizing the variation in reward values as the benchmark for evaluation.

Funder

National Natural Science Foundation of China

Youth Talent Lifting Project of the China Association for Science and Technology

Publisher

MDPI AG

Link

https://www.mdpi.com/2227-7390/12/15/2387/pdf

Reference34 articles.

1. GA-LNS Optimization for Helicopter Rescue Dispatch;Cheng;IEEE Trans. Intell. Veh.,2023

2. Reinforcement Learning and Particle Swarm Optimization Supporting Real-Time Rescue Assignments for Multiple Autonomous Underwater Vehicles;Wu;IEEE Trans. Intell. Transp. Syst.,2022

3. Using Affect as a Communication Modality to Improve Human-Robot Communication in Robot-Assisted Search and Rescue Scenarios;Akgun;IEEE Trans. Affect. Comput.,2023

4. Deep Reinforcement Learning-Based Rescue Resource Distribution Scheduling of Storm Surge Inundation Emergency Logistics;Wang;IEEE Trans. Ind. Inform.,2023

5. An Adaptive Conversion Speed Q-Learning Algorithm for Search and Rescue UAV Path Planning in Unknown Environments;Wu;IEEE Trans. Veh. Technol.,2023