Affiliation:
1. School of Mathematics, Shandong University, Jinan 250100, China
2. Zhongtai Securities Institute for Financial Studies, Shandong University, Jinan 250100, China
Abstract
In this paper, we study an independent Bernoulli two-armed bandit with unknown parameters ρ and λ, where ρ and λ have a pair of priori distributions such that dR(ρ)=CRρr0(1−ρ)r0′dμ(ρ),dL(λ)=CLλl0(1−λ)l0′dμ(λ) and μ is an arbitrary positive measure on [0,1]. Berry proposed the conjecture that, given a pair of priori distributions (R,L) of parameters ρ and λ, the arm with R is the current optimal choice if r0+r0′<l0+l0′ and the expectation of ρ is not less than that of λ. We give an easily verifiable equivalent form of Berry’s conjecture and use it to prove that Berry’s conjecture holds when R and L are two-point distributions as well as when R and L are beta distributions and the number of trials N≤r0r0′+1.
Funder
National Key Research and Development Program of China
Natural Science Foundation of Shandong Province
Subject
General Mathematics,Engineering (miscellaneous),Computer Science (miscellaneous)
Reference33 articles.
1. On the Likelihood that One Unknown Probability Exceeds Another in View of the Evidence of Two Samples;Thompson;Biometrika,1933
2. A two-armed bandit theory of market pricing;Rothschild;J. Econ. Theory,1974
3. Liberali, G.B., Hauser, J.R., and Urban, G.L. (2017). International Series in Operations Research & Management Science, Springer International Publishing.
4. Aggarwal, C.C. (2016). Recommender Systems, Springer International Publishing.
5. A Bernoulli Two-armed Bandit;Berry;Ann. Math. Stat.,1972