Multi-Armed Bandit with Budget Constraint and Variable Costs-Reference-Cited by-同舟云学术

Multi-Armed Bandit with Budget Constraint and Variable Costs

Published:2013-06-30 Issue:1 Volume:27 Page:232-238
ISSN:2374-3468
Container-title:Proceedings of the AAAI Conference on Artificial Intelligence
language:
Short-container-title:AAAI

Author:

Ding Wenkui,Qin Tao,Zhang Xu-Dong,Liu Tie-Yan

Abstract

We study the multi-armed bandit problems with budget constraint and variable costs (MAB-BV). In this setting, pulling an arm will receive a random reward together with a random cost, and the objective of an algorithm is to pull a sequence of arms in order to maximize the expected total reward with the costs of pulling those arms complying with a budget constraint. This new setting models many Internet applications (e.g., ad exchange, sponsored search, and cloud computing) in a more accurate manner than previous settings where the pulling of arms is either costless or with a fixed cost. We propose two UCB based algorithms for the new setting. The first algorithm needs prior knowledge about the lower bound of the expected costs when computing the exploration term. The second algorithm eliminates this need by estimating the minimal expected costs from empirical observations, and therefore can be applied to more real-world applications where prior knowledge is not available. We prove that both algorithms have nice learning abilities, with regret bounds of O(ln B). Furthermore, we show that when applying our proposed algorithms to a previous setting with fixed costs (which can be regarded as our special case), one can improve the previously obtained regret bound. Our simulation results on real-time bidding in ad exchange verify the effectiveness of the algorithms and are consistent with our theoretical analysis.

Publisher

Association for the Advancement of Artificial Intelligence (AAAI)

Subject

General Medicine

Cited by 17 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Budgeted Multi-Armed Bandits with Asymmetric Confidence Intervals;Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining;2024-08-24

2. On Intelligent Placement Decision-Making Algorithms for Wireless Digital Twin Networks via Bandit Learning;IEEE Transactions on Vehicular Technology;2024-06

3. Thompson Sampling with Information Relaxation Penalties;Management Science;2024-05-22

4. Reinforcement learning and bandits for speech and language processing: Tutorial, review and outlook;Expert Systems with Applications;2024-03

5. Joint UAV Trajectory Planning and LEO-Sat Selection in SAGIN;IEEE Open Journal of the Communications Society;2024