1. Bartlett, P., Dani, V., Hayes, T., Kakade, S., Rakhlin, A., Tewari, A.: High-probability regret bounds for bandit online linear optimization. In: Proceedings of the 21st Annual Conference on Learning Theory, COLT 2008, pp. 335–342. Omnipress (2008)
2. Bayandina, A.S., Gasnikov, A.V., Lagunovskaya, A.A.: Gradient-free two-point methods for solving stochastic nonsmooth convex optimization problems with small non-random noises. Autom. Remote. Control. 79(8), 1399–1408 (2018)
3. Ben-Tal, A., Nemirovski, A.: Lectures on Modern Convex Optimization: Analysis, Algorithms, and Engineering Applications. SIAM (2013)
4. Communications in Computer and Information Science;A Beznosikov,2020
5. Bubeck, S.: Regret analysis of stochastic and nonstochastic multi-armed bandit problems. Found. Trends® Mach. Learn. 5(1), 1–122 (2012). https://doi.org/10.1561/2200000024