Q-Sorting: An Algorithm for Reinforcement Learning Problems with Multiple Cumulative Constraints-Reference-Cited by-同舟云学术

Q-Sorting: An Algorithm for Reinforcement Learning Problems with Multiple Cumulative Constraints

Published:2024-06-28 Issue:13 Volume:12 Page:2001
ISSN:2227-7390
Container-title:Mathematics
language:en
Short-container-title:Mathematics

Author:

Huang Jianfeng¹^ORCID,Lu Guoqiang¹^ORCID,Li Yi¹,Wu Jiajun¹^ORCID

Affiliation:

1. College of Engineering, Shantou University, Shantou 515063, China

Abstract

This paper proposes a method and an algorithm called Q-sorting for reinforcement learning (RL) problems with multiple cumulative constraints. The primary contribution is a mechanism for dynamically determining the focus of optimization among multiple cumulative constraints and the objective. Executed actions are picked through a procedure with two steps: first filter out actions potentially breaking the constraints, and second sort the remaining ones according to the Q values of the focus in descending order. The algorithm was originally developed upon the classic tabular value representation and episodic setting of RL, but the idea can be extended and applied to other methods with function approximation and discounted setting. Numerical experiments are carried out on the adapted Gridworld and the motor speed synchronization problem, both with one and two cumulative constraints. Simulation results validate the effectiveness of the proposed Q-sorting in that cumulative constraints are honored both during and after the learning process. The advantages of Q-sorting are further emphasized through comparison with the method of lumped performances (LP), which takes constraints into account through weighting parameters. Q-sorting outperforms LP in both ease of use (unnecessity of trial and error to determine values of the weighting parameters) and performance consistency (6.1920 vs. 54.2635 rad/s for the standard deviation of the cumulative performance index over 10 repeated simulation runs). It has great potential for practical engineering use.

Funder

STU Scientific Research Initiation

Publisher

MDPI AG

Link

https://www.mdpi.com/2227-7390/12/13/2001/pdf

Reference41 articles.

1. Sutton, R.S., and Barto, A.G. (2018). Reinforcement Learning: An Introduction, MIT Press.

2. Playing Atari with Deep Reinforcement Learning;Mnih;Nature,2013

3. Mastering the Game of Go with Deep Neural Networks and Tree Search;Silver;Nature,2016

4. Geibel, P. (2006). Reinforcement Learning for MDPs with Constraints, Springer.

5. QoS and Fairness Constrained Convex Optimization of Resource Allocation for Wireless Cellular and Ad Hoc Networks;Julian;Proceedings of the Twenty-First Annual Joint Conference of the IEEE Computer and Communications Societies,2002