1. Fu, J., Kumar, A., Nachum, O., Tucker, G., Levine, S.: D4RL: datasets for deep data-driven reinforcement learning. arXiv preprint arXiv:2004.07219 (2020)
2. Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897. PMLR (2015)
3. Zhuang, Z., Lei, K., Liu, J., Wang, D., Guo, Y.: Behavior proximal policy optimization. arXiv preprint arXiv:2302.11312 (2023)
4. Pomerleau, D.A.: ALVINN: an autonomous land vehicle in a neural network. In: Advances in Neural Information Processing Systems, vol. 1 (1988)
5. Jiang, Y., et al.: VIMA: general robot manipulation with multimodal prompts. arXiv preprint arXiv:2210.03094 (2022)