1. Agarwal, A., Kakade, S., Yang, L.F.: Model-based reinforcement learning with a generative model is minimax optimal. In: Conference on Learning Theory, pp. 67–83. PMLR (2020)
2. Agarwal, R., Schuurmans, D., Norouzi, M.: An optimistic perspective on offline reinforcement learning. In: ICML, pp. 104–114. PMLR (2020)
3. Ajay, A., Kumar, A., Agrawal, P., Levine, S., Nachum, O.: Opal: Offline primitive discovery for accelerating offline reinforcement learning. arXiv preprint arXiv:2010.13611 (2020)
4. An, G., Moon, S., Kim, J.H., Song, H.O.: Uncertainty-based offline reinforcement learning with diversified q-ensemble. Adv. Neural. Inf. Process. Syst. 34, 7436–7447 (2021)
5. Azar, M.G., Osband, I., Munos, R.: Minimax regret bounds for reinforcement learning. In: International Conference on Machine Learning, pp. 263–272. PMLR (2017)