A Change-Detection Based Framework for Piecewise-Stationary Multi-Armed Bandit Problem-Reference-Cited by-同舟云学术

A Change-Detection Based Framework for Piecewise-Stationary Multi-Armed Bandit Problem

Published:2018-04-29 Issue:1 Volume:32 Page:
ISSN:2374-3468
Container-title:Proceedings of the AAAI Conference on Artificial Intelligence
language:
Short-container-title:AAAI

Author:

Liu Fang,Lee Joohyun,Shroff Ness

Abstract

The multi-armed bandit problem has been extensively studied under the stationary assumption. However in reality, this assumption often does not hold because the distributions of rewards themselves may change over time. In this paper, we propose a change-detection (CD) based framework for multi-armed bandit problems under the piecewise-stationary setting, and study a class of change-detection based UCB (Upper Confidence Bound) policies, CD-UCB, that actively detects change points and restarts the UCB indices. We then develop CUSUM-UCB and PHT-UCB, that belong to the CD-UCB class and use cumulative sum (CUSUM) and Page-Hinkley Test (PHT) to detect changes. We show that CUSUM-UCB obtains the best known regret upper bound under mild assumptions. We also demonstrate the regret reduction of the CD-UCB policies over arbitrary Bernoulli rewards and Yahoo! datasets of webpage click-through rates.

Publisher

Association for the Advancement of Artificial Intelligence (AAAI)

Subject

General Medicine

Cited by 20 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Which LLM to Play? Convergence-Aware Online Model Selection with Time-Increasing Bandits;Proceedings of the ACM Web Conference 2024;2024-05-13

2. Competing Bandits in Non-Stationary Matching Markets;IEEE Transactions on Information Theory;2024-04

3. SOQ: Structural Reinforcement Learning for Constrained Delay Minimization With Channel State Information;IEEE Internet of Things Journal;2024-02-01

4. Nonstationary Stochastic Bandits: UCB Policies and Minimax Regret;IEEE Open Journal of Control Systems;2024

5. Handling Concept Drift in Non-stationary Bandit Through Predicting Future Rewards;Lecture Notes in Computer Science;2024