Abstract
AbstractWe consider a resource-aware variant of the classical multi-armed bandit problem: In each round, the learner selects an arm and determines a resource limit. It then observes a corresponding (random) reward, provided the (random) amount of consumed resources remains below the limit. Otherwise, the observation is censored, i.e., no reward is obtained. For this problem setting, we introduce a measure of regret, which incorporates both the actual amount of consumed resources of each learning round and the optimality of realizable rewards as well as the risk of exceeding the allocated resource limit. Thus, to minimize regret, the learner needs to set a resource limit and choose an arm in such a way that the chance to realize a high reward within the predefined resource limit is high, while the resource limit itself should be kept as low as possible. We propose a UCB-inspired online learning algorithm, which we analyze theoretically in terms of its regret upper bound. In a simulation study, we show that our learning algorithm outperforms straightforward extensions of standard multi-armed bandit algorithms.
Funder
Ludwig-Maximilians-Universität München
Publisher
Springer Science and Business Media LLC
Subject
Artificial Intelligence,Software
Reference47 articles.
1. Abe, N., Biermann, A., & Long, P. (2003). Reinforcement learning with immediate rewards and linear hypotheses. Algorithmica, 37(4), 263–293.
2. Abernethy, J., Amin, K., & Zhu, R. (2016). Threshold bandit, with and without censored feedback. In NeurIPS (pp. 4896–4904).
3. Agrawal, S., & Goyal, N. (2012). Analysis of Thompson sampling for the multi-armed bandit problem. In COLT (pp. 1–39).
4. Allmendinger, R., & Knowles, J. (2010). On-line purchasing strategies for an evolutionary algorithm performing resource-constrained optimization. In International Conference on Parallel Problem Solving from Nature (pp. 161–170). Springer.
5. Allmendinger, R., & Knowles, J. (2011). Policy learning in resource-constrained optimization. In GECCO (pp. 1971–1978).
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Case-Based Sample Generation Using Multi-Armed Bandits;Case-Based Reasoning Research and Development;2023