1. Agrawal, S., Goyal, N.: Analysis of Thompson sampling for the multi-armed bandit problem. In: COLT, p. 39.1–39.26 (2012)
2. Auer, P., Cesa-Bianchi, N., Fischer, P.: Finite-time analysis of the multiarmed bandit problem. Mach. Learn. 47(2–3), 235–256 (2002)
3. Auer, P., Chiang, C.-K., Ortner, R., Drugan, M.M.: Pareto front identification from stochastic bandit feedback. In: AISTATS, pp. 939–947 (2016)
4. Benabbou, N., Perny, P.: Combining preference elicitation and search in multiobjective state-space graphs. In: IJCAI, pp. 297–303 (2015)
5. Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, New York (2006)