1. R. Agrawal. Sample mean based index policies by o (log n) regret for the multi-armed bandit problem. Advances in applied probability, 27(4):1054--1078, 1995.
2. S. Bhatt, G. Fang, and P. Li. Piecewise stationary bandits under risk criteria. In International Conference on Artificial Intelligence and Statistics, pages 4313--4335. PMLR, 2023.
3. Bandits With Heavy Tail
4. A. Garivier and E. Moulines. On upper-confidence bound policies for non-stationary bandit problems. arXiv preprint arXiv:0805.3415, 2008.
5. Traffic Management in IoT Backbone Networks Using GNN and MAB with SDN Orchestration