1. Z. Fu, A. Kumar, A. Agarwal, et al., Coupling Vision and Proprioception for Navigation of Legged Robots, in: CVPR, 2022.
2. HVLM: Exploring human-like visual cognition and language-memory network for visual dialog;Sun;IPM,2022
3. L. Yang, Y. Xu, C. Yuan, et al., Improving Visual Grounding with Visual-Linguistic Verification and Iterative Reasoning, in: CVPR, 2022.
4. J. Deng, Z. Yang, T. Chen, et al., TransVG: End-to-End Visual Grounding with Transformers, in: ICCV, 2021.
5. A. Radford, J.W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, et al., Learning transferable visual models from natural language supervision, in: ICML, 2021.