Author:
Mandel Travis,Liu Yun-En,Brunskill Emma,Popović Zoran
Abstract
Current algorithms for the standard multi-armed bandit problem have good empirical performance and optimal regret bounds. However, real-world problems often differ from the standard formulation in several ways. First, feedback may be delayed instead of arriving immediately. Second, the real world often contains structure which suggests heuristics, which we wish to incorporate while retaining the best-known theoretical guarantees. Third, we may wish to make use of an arbitrary prior dataset without negatively impacting performance. Fourth, we may wish to efficiently evaluate algorithms using a previously collected dataset. Surprisingly, these seemingly-disparate problems can be addressed using algorithms inspired by a recently-developed queueing technique. We present the Stochastic Delayed Bandits (SDB) algorithm as a solution to these four problems, which takes black-box bandit algorithms (including heuristic approaches) as input while achieving good theoretical guarantees. We present empirical results from both synthetic simulations and real-world data drawn from an educational game. Our results show that SDB outperforms state-of-the-art approaches to handling delay, heuristics, prior data, and evaluation.
Publisher
Association for the Advancement of Artificial Intelligence (AAAI)
Cited by
8 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Analyzing and Enhancing Queue Sampling for Energy-Efficient Remote Control of Bandits;2024 IEEE International Mediterranean Conference on Communications and Networking (MeditCom);2024-07-08
2. Multi-armed Bandits with Generalized Temporally-Partitioned Rewards;Lecture Notes in Computer Science;2024
3. Dually Enhanced Delayed Feedback Modeling for Streaming Conversion Rate Prediction;Proceedings of the 32nd ACM International Conference on Information and Knowledge Management;2023-10-21
4. Remote Control of Bandits Over Queues - Relevance of Information Freshness;2023 21st International Symposium on Modeling and Optimization in Mobile, Ad Hoc, and Wireless Networks (WiOpt);2023-08-24
5. Energy-Aware Spreading Factor Selection in LoRaWAN Using Delayed-Feedback Bandits;2023 IFIP Networking Conference (IFIP Networking);2023-06-12