1. Abolghasemi, P., Mazaheri, A., Shah, M., et al. (2019). Pay attention!-robustifying a deep visuomotor policy through task-focused visual attention. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 4254–4262).
2. Ahn, M., Brohan, A., Brown, N., et al. (2022). Do as I can, not as I say: Grounding language in robotic affordances. arXiv:2204.01691
3. Alayrac, J. B., Donahue, J., Luc, P., et al. (2022). Flamingo: A visual language model for few-shot learning. Advances in Neural Information Processing Systems, 35, 23716–23736.
4. Anderson, P., Shrivastava, A., Parikh, D., et al. (2019). Chasing ghosts: Instruction following as Bayesian state tracking. In Advances in neural information processing systems (Vol. 32).
5. Antol, S., Agrawal, A., Lu, J., et al. (2015). Vqa: Visual question answering. In Proceedings of the IEEE international conference on computer vision (pp. 2425–2433).