Bandit-Based Planning and Learning in Continuous-Action Markov Decision Processes-Reference-Cited by-同舟云学术

Bandit-Based Planning and Learning in Continuous-Action Markov Decision Processes

Published:2012-05-14 Issue: Volume:22 Page:306-314
ISSN:2334-0843
Container-title:Proceedings of the International Conference on Automated Planning and Scheduling
language:
Short-container-title:ICAPS

Author:

Weinstein Ari,Littman Michael

Abstract

Recent research leverages results from the continuous-armed bandit literature to create a reinforcement-learning algorithm for continuous state and action spaces. Initially proposed in a theoretical setting, we provide the first examination of the empirical properties of the algorithm. Through experimentation, we demonstrate the effectiveness of this planning method when coupled with exploration and model learning and show that, in addition to its formal guarantees, the approach is very competitive with other continuous-action reinforcement learners.

Publisher

Association for the Advancement of Artificial Intelligence (AAAI)

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Query-Efficient Adversarial Attack With Low Perturbation Against End-to-End Speech Recognition Systems;IEEE Transactions on Information Forensics and Security;2023

2. Deep Reinforcement Learning in Smart Grid: Progress and Prospects;2022 XXVIII International Conference on Information, Communication and Automation Technologies (ICAT);2022-06-16

3. Voronoi Progressive Widening: Efficient Online Solvers for Continuous State, Action, and Observation POMDPs;2021 60th IEEE Conference on Decision and Control (CDC);2021-12-14