1. Abbasi-Yadkori, Y., Pál, D., Szepesvári, C.: Improved algorithms for linear stochastic bandits. In: Advances in Neural Information Processing Systems, pp. 2312–2320 (2011)
2. Agarwal, A., Luo, H., Neyshabur, B., Schapire, R.E.: Corralling a band of bandit algorithms. In: Conference on Learning Theory, pp. 12–38. PMLR (2017)
3. Arora, R., Marinov, T.V., Mohri, M.: Corralling stochastic bandit algorithms. In: International Conference on Artificial Intelligence and Statistics, pp. 2116–2124. PMLR (2021)
4. Ayoub, A., Jia, Z., Szepesvari, C., Wang, M., Yang, L.F.: Model-based reinforcement learning with value-targeted regression. arXiv preprint arXiv:2006.01107 (2020)
5. Balakrishnan, S., Wainwright, M.J., Yu, B., et al.: Statistical guarantees for the EM algorithm: from population to sample-based analysis. Ann. Stat. 45(1), 77–120 (2017)