Sliding-Window Thompson Sampling for Non-Stationary Settings-Reference-Cited by-同舟云学术

Sliding-Window Thompson Sampling for Non-Stationary Settings

Published:2020-05-26 Issue: Volume:68 Page:311-364
ISSN:1076-9757
Container-title:Journal of Artificial Intelligence Research
language:
Short-container-title:jair

Author:

Trovo Francesco,Paladino Stefano,Restelli Marcello,Gatti Nicola

Abstract

Multi-Armed Bandit (MAB) techniques have been successfully applied to many classes of sequential decision problems in the past decades. However, non-stationary settings -- very common in real-world applications -- received little attention so far, and theoretical guarantees on the regret are known only for some frequentist algorithms. In this paper, we propose an algorithm, namely Sliding-Window Thompson Sampling (SW-TS), for nonstationary stochastic MAB settings. Our algorithm is based on Thompson Sampling and exploits a sliding-window approach to tackle, in a unified fashion, two different forms of non-stationarity studied separately so far: abruptly changing and smoothly changing. In the former, the reward distributions are constant during sequences of rounds, and their change may be arbitrary and happen at unknown rounds, while, in the latter, the reward distributions smoothly evolve over rounds according to unknown dynamics. Under mild assumptions, we provide regret upper bounds on the dynamic pseudo-regret of SW-TS for the abruptly changing environment, for the smoothly changing one, and for the setting in which both the non-stationarity forms are present. Furthermore, we empirically show that SW-TS dramatically outperforms state-of-the-art algorithms even when the forms of non-stationarity are taken separately, as previously studied in the literature.

Publisher

AI Access Foundation

Subject

Artificial Intelligence

Cited by 26 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Regulation of reinforcement learning parameters captures long‐term changes in rat behaviour;European Journal of Neuroscience;2024-06-24

2. Online Query-Based Data Pricing with Time-Discounting Valuations;2024 IEEE 40th International Conference on Data Engineering (ICDE);2024-05-13

3. Which LLM to Play? Convergence-Aware Online Model Selection with Time-Increasing Bandits;Proceedings of the ACM Web Conference 2024;2024-05-13

4. Adapting bandit algorithms for settings with sequentially available arms;Engineering Applications of Artificial Intelligence;2024-05

5. Actively Adaptive Multi-Armed Bandit Based Beam Tracking for mmWave MIMO Systems;2024 IEEE Wireless Communications and Networking Conference (WCNC);2024-04-21