A priority experience replay actor-critic algorithm using self-attention mechanism for strategy optimization of discrete problems-Reference-Cited by-同舟云学术

A priority experience replay actor-critic algorithm using self-attention mechanism for strategy optimization of discrete problems

Published:2024-06-28 Issue: Volume:10 Page:e2161
ISSN:2376-5992
Container-title:PeerJ Computer Science
language:en
Short-container-title:

Author:

Sun Yuezhongyi,Yang Boyu

Abstract

In the dynamic field of deep reinforcement learning, the self-attention mechanism has been increasingly recognized. Nevertheless, its application in discrete problem domains has been relatively limited, presenting complex optimization challenges. This article introduces a pioneering deep reinforcement learning algorithm, termed Attention-based Actor-Critic with Priority Experience Replay (A2CPER). A2CPER combines the strengths of self-attention mechanisms with the Actor-Critic framework and prioritized experience replay to enhance policy formulation for discrete problems. The algorithm’s architecture features dual networks within the Actor-Critic model—the Actor formulates action policies and the Critic evaluates state values to judge the quality of policies. The incorporation of target networks aids in stabilizing network optimization. Moreover, the addition of self-attention mechanisms bolsters the policy network’s capability to focus on critical information, while priority experience replay promotes training stability and reduces correlation among training samples. Empirical experiments on discrete action problems validate A2CPER’s adeptness at policy optimization, marking significant performance improvements across tasks. In summary, A2CPER highlights the viability of self-attention mechanisms in reinforcement learning, presenting a robust framework for discrete problem-solving and potential applicability in complex decision-making scenarios.

Publisher

PeerJ

Link

https://peerj.com/articles/cs-2161.pdf

Reference34 articles.

1. Better exploration with optimistic actor critic;Ciosek,2019

2. Phasic policy gradient;Cobbe,2021

3. Assessing cyber-incidents using machine learning;Diallo;International Journal of Information and Computer Security,2018

4. High generalization performance structured self-attention model for knapsack problem;Ding;Discrete Mathematics, Algorithms and Applications,2021

5. Comparison of machine learners on an aba experiment format of the cart-pole task;Eberding,2022