PANDA: Prompt-Based Context- and Indoor-Aware Pretraining for Vision and Language Navigation-Reference-Cited by-同舟云学术

PANDA: Prompt-Based Context- and Indoor-Aware Pretraining for Vision and Language Navigation

Published:2024 Issue: Volume: Page:187-200
ISSN:0302-9743
Container-title:Lecture Notes in Computer Science
language:en
Short-container-title:

Author:

Liu Ting,Hu Yue,Wu Wansen,Wang Youkai,Xu Kai,Yin Quanjun

Publisher

Springer Nature Switzerland

Link

https://link.springer.com/content/pdf/10.1007/978-3-031-53305-1_15

Reference40 articles.

1. Das, A., Datta, S., Gkioxari, G., Lee, S., Parikh, D., Batra, D.: Embodied question answering. In: Proceedings of CVPR, pp. 1–10 (2018)

2. Qi, Y., Wu, Q., Anderson, P., et al.: Reverie: remote embodied visual referring expression in real indoor environments. In: Proceedings of CVPR, pp. 9982–9991 (2020)

3. Lecture Notes in Computer Science;A Majumdar,2020

4. Hao, W., Li, C., Li, X., Carin, L., et al.: Towards learning a generic agent for vision-and-language navigation via pre-trainin. In: CVPR 2022, pp. 13134–13143. IEEE (2022)

5. Guhur, P.-L., Tapaswi, M., Chen, S., et al.: Airbert: in-domain pretraining for vision-and-language navigation. In: Proceedings of ICCV, pp. 1634–1643. IEEE (2021)

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Building Lane-Level Maps from Aerial Images;ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP);2024-04-14

2. VGDIFFZERO: Text-To-Image Diffusion Models Can Be Zero-Shot Visual Grounders;ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP);2024-04-14