Boosting Offline Reinforcement Learning with Residual Generative Modeling-Reference-Cited by-同舟云学术

Boosting Offline Reinforcement Learning with Residual Generative Modeling

Published:2021-08 Issue: Volume: Page:
ISSN:
Container-title:Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence
language:
Short-container-title:

Author:

Wei Hua¹,Ye Deheng¹,Liu Zhao¹,Wu Hao¹,Yuan Bo¹,Fu Qiang¹,Yang Wei¹,Li Zhenhui²

Affiliation:

1. Tencent AI Lab

2. The Pennsylvania State University

Abstract

Offline reinforcement learning (RL) tries to learn the near-optimal policy with recorded offline experience without online exploration.Current offline RL research includes: 1) generative modeling, i.e., approximating a policy using fixed data; and 2) learning the state-action value function. While most research focuses on the state-action function part through reducing the bootstrapping error in value function approximation induced by the distribution shift of training data, the effects of error propagation in generative modeling have been neglected. In this paper, we analyze the error in generative modeling. We propose AQL (action-conditioned Q-learning), a residual generative model to reduce policy approximation error for offline RL. We show that our method can learn more accurate policy approximations in different benchmark datasets. In addition, we show that the proposed offline RL method can learn more competitive AI agents in complex control tasks under the multiplayer online battle arena (MOBA) game, Honor of Kings.

Publisher

International Joint Conferences on Artificial Intelligence Organization

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Generative AI in mobile networks: a survey;Annals of Telecommunications;2023-08-17

2. Compressive Features in Offline Reinforcement Learning for Recommender Systems;2021 IEEE International Conference on Big Data (Big Data);2021-12-15