Adaptive Discretization for Episodic Reinforcement Learning in Metric Spaces-Reference-Cited by-同舟云学术

Adaptive Discretization for Episodic Reinforcement Learning in Metric Spaces

Published:2019-12-17 Issue:3 Volume:3 Page:1-44
ISSN:2476-1249
Container-title:Proceedings of the ACM on Measurement and Analysis of Computing Systems
language:en
Short-container-title:Proc. ACM Meas. Anal. Comput. Syst.

Author:

Sinclair Sean R.¹,Banerjee Siddhartha¹,Yu Christina Lee¹

Affiliation:

1. Cornell University, Ithaca, NY, USA

Abstract

We present an efficient algorithm for model-free episodic reinforcement learning on large (potentially continuous) state-action spaces. Our algorithm is based on a novel Q-learning policy with adaptive data-driven discretization. The central idea is to maintain a finer partition of the state-action space in regions which are frequently visited in historical trajectories, and have higher payoff estimates. We demonstrate how our adaptive partitions take advantage of the shape of the optimal Q-function and the joint space, without sacrificing the worst-case performance. In particular, we recover the regret guarantees of prior algorithms for continuous state-action spaces, which additionally require either an optimal discretization as input, and/or access to a simulation oracle. Moreover, experiments demonstrate how our algorithm automatically adapts to the underlying structure of the problem, resulting in much better performance compared both to heuristics and Q-learning with uniform discretization.

Funder

NSF

Publisher

Association for Computing Machinery (ACM)

Subject

General Medicine

Link

https://dl.acm.org/doi/pdf/10.1145/3366703

Reference32 articles.

1. Luce Brotcorne Gilbert Laporte and Frederic Semet. 2003. Ambulance location and relocation models. European journal of operational research Vol. 147 3 (2003) 451--463. Luce Brotcorne Gilbert Laporte and Frederic Semet. 2003. Ambulance location and relocation models. European journal of operational research Vol. 147 3 (2003) 451--463.

2. Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems

3. Sébastien Bubeck Gilles Stoltz Csaba Szepesvári and Rémi Munos. 2009. Online optimization in X-armed bandits. In Advances in Neural Information Processing Systems. 201--208. Sébastien Bubeck Gilles Stoltz Csaba Szepesvári and Rémi Munos. 2009. Online optimization in X-armed bandits. In Advances in Neural Information Processing Systems. 201--208.

Cited by 12 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Multi-agent Exploration with Sub-state Entropy Estimation;2024 International Joint Conference on Neural Networks (IJCNN);2024-06-30

2. Generally Applicable Q-Table Compression Method and Its Application for Constrained Stochastic Graph Traversal Optimization Problems;Information;2024-03-31

3. Application of self-improving Q-learning controller for a class of dynamical processes: Implementation aspects;Applied Soft Computing;2024-02

4. Adaptivity, Structure, and Objectives in Sequential Decision-Making;ACM SIGMETRICS Performance Evaluation Review;2024-01-03

5. Bandits atop Reinforcement Learning: Tackling Online Inventory Models with Cyclic Demands;Management Science;2023-10-26