1. Abbasi-Yadkori, Y., Bartlett, P. L., Gabillon, V., Malek, A., & Valko, M. (2018). Best of both worlds: Stochastic & adversarial best-arm identification. In Conference on learning theory, COLT 2018, volume 75 of Proceedings of machine learning research (pp. 918–949).
2. Agarwal, A., Luo, H., Neyshabur, B., & Schapire, R. E. (2017). Corralling a band of bandit algorithms. In Proceedings of the 30th conference on learning theory, COLT 2017, volume 65 of Proceedings of machine learning research (pp. 12–38).
3. Arora, S., Hazan, E., & Kale, S. (2012). The multiplicative weights update method: A meta-algorithm and applications. Theory of Computing, 8(1), 121–164.
4. Audibert, J.-Y., & Bubeck, S. (2010). Regret bounds and minimax policies under partial monitoring. Journal of Machine Learning Research, 11, 2785–2836.
5. Auer, P., & Chiang, C.-K. (2016). An algorithm with nearly optimal pseudo-regret for both stochastic and adversarial bandits. In Proceedings of the 29th conference on learning theory, COLT 2016, volume 49 of Proceedings of machine learning research (pp. 116–120).