1. Achiam, J., Knight, E., & Abbeel, P. (2019). Towards characterizing divergence in deep q-learning. arXiv preprint arXiv:1903.08894
2. Ahn, M., Brohan, A., Brown, N., Chebotar, Y., Cortes, O., David, B., Finn, C., Fu, C., Gopalakrishnan, K., Hausman, K., Herzog, A., Ho, D., Hsu, J., Ibarz, J., Ichter, B., Irpan, A., Jang, E., Ruano, R.J., Jeffrey, K., Jesmonth, S., Joshi, N., Julian, R., Kalashnikov, D., Kuang, Y., Lee, K.-H., Levine, S., Lu, Y., Luu, L., Parada, C., Pastor, P., Quiambao, J., Rao, K., Rettinghouse, J., Reyes, D., Sermanet, P., Sievers, N., Tan, C., Toshev, A., Vanhoucke, V., Xia, F., Xiao, T., Xu, P., Xu, S., Yan, M., & Zeng, A. (2022). Do as i can and not as i say: Grounding language in robotic affordances. arXiv Preprint arXiv:2204.01691
3. Andrychowicz, M., Raichuk, A., Stańczyk, P., Orsini, M., Girgin, S., Marinier, R., Hussenot, L., Geist, M., Pietquin, O., Michalski, M., Gelly, S., & Bachem, O. (2021). What matters in on-policy reinforcement learning? A large-scale empirical study. In Proceedings of international conference on learning representations (ICLR).
4. Andrychowicz, M., Wolski, F., Ray, A., Schneider, J., Fong, R., Welinder, P., McGrew, B., Tobin, J., Pieter Abbeel, O., & Zaremba, W. (2017). Hindsight experience replay. In Proceedings of advances in neural information processing systems (NeurIPS) (Vol. 30).
5. Aviral Kumar, D. G. Rishabh Agarwal, & Levine, S. (2021). Implicit under-parameterization inhibits data-efficient deep reinforcement learning. In Proceedings of international conference on learning representations (ICLR).