A Closer Look at Invalid Action Masking in Policy Gradient Algorithms-Reference-Cited by-同舟云学术

A Closer Look at Invalid Action Masking in Policy Gradient Algorithms

Published:2022-05-04 Issue: Volume:35 Page:
ISSN:2334-0762
Container-title:The International FLAIRS Conference Proceedings
language:
Short-container-title:FLAIRS

Author:

Huang Shengyi,Ontañón Santiago

Abstract

In recent years, Deep Reinforcement Learning (DRL) algorithms have achieved state-of-the-art performance in many challenging strategy games. Because these games have complicated rules, an action sampled from the full discrete action distribution predicted by the learned policy is likely to be invalid according to the game rules (e.g., walking into a wall). The usual approach to deal with this problem in policy gradient algorithms is to “mask out” invalid actions and just sample from the set of valid actions. The implications of this process, however, remain under-investigated. In this paper, we 1) show theoretical justification for such a practice, 2) empirically demonstrate its importance as the space of invalid actions grows, and 3) provide further insights by evaluating different action masking regimes, such as removing masking after an agent has been trained using masking.

Publisher

University of Florida George A Smathers Libraries

Cited by 81 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. A reinforcement learning approach with masked agents for chemical process flowsheet design;AIChE Journal;2024-08-30

2. Task-Importance-Oriented Task Selection and Allocation Scheme for Mobile Crowdsensing;Mathematics;2024-08-10

3. A Competition Winning Deep Reinforcement Learning Agent in microRTS;2024 IEEE Conference on Games (CoG);2024-08-05

4. A Simple, Solid, and Reproducible Baseline for Bridge Bidding AI;2024 IEEE Conference on Games (CoG);2024-08-05

5. Dynamic service provisioning in heterogeneous fog computing architecture using deep reinforcement learning;The Journal of Supercomputing;2024-07-29