1. Ahmed Abdelreheem , Kyle Olszewski , Hsin-Ying Lee , Peter Wonka , and Panos Achlioptas . 2022. ScanEnts3D: Exploiting Phrase-to-3D-Object Correspondences for Improved Visio-Linguistic Models in 3D Scenes. arXiv preprint arXiv:2212.06250 ( 2022 ). Ahmed Abdelreheem, Kyle Olszewski, Hsin-Ying Lee, Peter Wonka, and Panos Achlioptas. 2022. ScanEnts3D: Exploiting Phrase-to-3D-Object Correspondences for Improved Visio-Linguistic Models in 3D Scenes. arXiv preprint arXiv:2212.06250 (2022).
2. ReferIt3D: Neural Listeners for Fine-Grained 3D Object Identification in Real-World Scenes
3. Look around and refer: 2d synthetic semantics knowledge distillation for 3d visual grounding;Bakr Eslam;Advances in Neural Information Processing Systems,2022
4. Dense Events Grounding in Video
5. 3DJCG: A Unified Framework for Joint Dense Captioning and Visual Grounding on 3D Point Clouds