1. Agarwal, A., Kakade, S. M., Lee, J. D., & Mahajan, G. (2020). Optimality and approximation with policy gradient methods in markov decision processes. In Jacob, A. & Shivani A. (eds.), Proceedings of thirty third conference on learning theory, vol. 125 of Proceedings of machine learning research, pp. 64–66. PMLR.
2. Akkaya, I., Andrychowicz, M., Chociej, M., Litwin, M., McGrew, B., Petron, A., & Zhang, L. (2019). Solving rubik’s cube with a robot hand
3. Allshire, A., Mittal, M., Lodaya, V., Makoviychuk, V., Makoviichuk, D., Widmaier, F., Wüthrich, M., Bauer, S., Handa, A., & Garg, A. (2021). Transferring dexterous manipulation from GPU simulation to a remote real-world TriFinger.
4. Amin, S., Gomrokchi, M., Satija, H., van Hoof, H., & Precup, D. (2021). A survey of exploration methods in reinforcement learning. arXiv:2109.00157
5. Andrychowicz, O. M., Baker, B., Chociej, M., Jozefowicz, R., McGrew, B., Pachocki, J., Petron, A., Plappert, M., Powell, G., Ray, A., & Schneider, J. (2020). Learning dexterous in-hand manipulation. The International Journal of Robotics Research, 39(1), 3–20.