LUNA: Language as Continuing Anchors for Referring Expression Comprehension-Reference-Cited by-同舟云学术

LUNA: Language as Continuing Anchors for Referring Expression Comprehension

Published:2023-10-26 Issue: Volume: Page:
ISSN:
Container-title:Proceedings of the 31st ACM International Conference on Multimedia
language:
Short-container-title:

Author:

Liang Yaoyuan¹^ORCID,Yang Zhao²^ORCID,Tang Yansong¹^ORCID,Fan Jiashuo¹^ORCID,Li Ziran³^ORCID,Wang Jingang³^ORCID,Torr Philip H.S.²^ORCID,Huang Shao-Lun¹^ORCID

Affiliation:

1. Tsinghua University, Shenzhen, China

2. University of Oxford, Oxford, United Kingdom

3. Meituan Inc., Beijing, China

Funder

National Key R&D Program of China

Shenzhen Science and Technology Program

Beijing Nova Program

Publisher

ACM

Link

https://dl.acm.org/doi/pdf/10.1145/3581783.3612584

Reference57 articles.

1. Stanislaw Antol , Aishwarya Agrawal , Jiasen Lu , Margaret Mitchell , Dhruv Batra , C Lawrence Zitnick , and Devi Parikh . 2015 . Vqa: Visual question answering. In ICCV. Stanislaw Antol, Aishwarya Agrawal, Jiasen Lu, Margaret Mitchell, Dhruv Batra, C Lawrence Zitnick, and Devi Parikh. 2015. Vqa: Visual question answering. In ICCV.

2. Nicolas Carion Francisco Massa Gabriel Synnaeve Nicolas Usunier Alexander Kirillov and Sergey Zagoruyko. 2020. End-to-End Object Detection with Transformers. In ECCV. Nicolas Carion Francisco Massa Gabriel Synnaeve Nicolas Usunier Alexander Kirillov and Sergey Zagoruyko. 2020. End-to-End Object Detection with Transformers. In ECCV.

3. Yen-Chun Chen , Linjie Li , Licheng Yu , Ahmed El Kholy , Faisal Ahmed, Zhe Gan, Yu Cheng, and Jingjing Liu. 2020 . Uniter : Universal image-text representation learning. In ECCV. Yen-Chun Chen, Linjie Li, Licheng Yu, Ahmed El Kholy, Faisal Ahmed, Zhe Gan, Yu Cheng, and Jingjing Liu. 2020. Uniter: Universal image-text representation learning. In ECCV.

4. Jiajun Deng , Zhengyuan Yang , Tianlang Chen , Wengang Zhou , and Houqiang Li . 2021 . Transvg: End-to-end visual grounding with transformers. In ICCV. Jiajun Deng, Zhengyuan Yang, Tianlang Chen, Wengang Zhou, and Houqiang Li. 2021. Transvg: End-to-end visual grounding with transformers. In ICCV.

5. Jacob Devlin , Ming-Wei Chang , Kenton Lee , and Kristina Toutanova . 2019 . BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In NAACL. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In NAACL.