Affiliation:
1. Sichuan University
2. Monash University
3. Sun Yat-sen University
Funder
National Natural Science Foundation of China
Reference62 articles.
1. Multimodal alignment using representation codebook;duan;Proceedings of the IEEEICVF Conference on Computer Vision and Pattern Recognition (CVPR),2022
2. Toward explainable and fine-grained 3D grounding through referring textual phrases;yuan;ArXiv Preprint,2022
3. ScanNet: Richly-Annotated 3D Reconstructions of Indoor Scenes
4. Deep Modular Co-Attention Networks for Visual Question Answering
5. Masked Autoencoders Are Scalable Vision Learners