1. Jean-Yves Audibert and Sébastien Bubeck. 2010. Best Arm Identification in Multi-Armed Bandits. In COLT - 23th Conference on Learning Theory - 2010. Haifa, Israel, 13 p. https://enpc.hal.science/hal-00654404
2. Adaptive routing with end-to-end feedback
3. Viktor Bengs, Aadirupa Saha, and Eyke Hüllermeier. 2022. Stochastic Contextual Dueling Bandits under Linear Stochastic Transitivity Models. In Proceedings of the 39th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 162), Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvari, Gang Niu, and Sivan Sabato (Eds.). PMLR, 1764--1786. https://proceedings.mlr.press/v162/bengs22a.html