Abstract
Abstract
We consider a variation of upper confidence bound strategy for multi-armed bandit in batch processing setting. Invariant descriptions with the unit control horizon are obtained for upper bounds in the strategy and for regret. A set of Monte-Carlo simulations are performed for different settings of MABs to determine the minimax regret for multi-armed bandits with different configurations.
Subject
General Physics and Astronomy
Reference10 articles.
1. Using Confidence Bounds for Exploitation-Exploration Trade-offs;Auer;Journal of Machine Learning Research,2002
Cited by
4 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献