Multi-armed Bandit Learning on a Graph-Reference-Cited by-同舟云学术

Multi-armed Bandit Learning on a Graph

Published:2023-03-22 Issue: Volume: Page:
ISSN:
Container-title:2023 57th Annual Conference on Information Sciences and Systems (CISS)
language:
Short-container-title:

Author:

Zhang Tianpeng¹,Johansson Kasper²,Li Na¹

Affiliation:

1. Harvard School of Engineering and Applied Sciences,Cambridge,MA,USA

2. Stanford University,Department of Electrical Engineering,Stanford,CA,USA

Publisher

IEEE

Link

http://xplorestaging.ieee.org/ielx7/10089615/10089616/10089744.pdf?arnumber=10089744

Reference28 articles.

1. Online learning in episodic markovian decision processes by relative entropy policy search;zimin;Advances in neural information processing systems,2013

2. Minimax regret of switching-constrained online convex optimization: No phase transition;chen;Advances in neural information processing systems,2020

3. Minimax regret bounds for reinforcement learning;azar;Proceedings of the 34th International Conference on Machine Learning - Volume 70 ser ICML'17,0

4. Nearly minimax optimal reinforcement learning for discounted mdps;he;Advances in neural information processing systems,2021

5. Phase Transitions and Cyclic Phenomena in Bandits with Switching Constraints

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Cooperative Multi-Agent Graph Bandits: UCB Algorithm and Regret Analysis;2024 American Control Conference (ACC);2024-07-10

2. Graph-Enhanced Hybrid Sampling for Multi-Armed Bandit Recommendation;ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP);2024-04-14