Towards General Purpose Vision Systems: An End-to-End Task-Agnostic Vision-Language Architecture-Reference-Cited by-同舟云学术

Towards General Purpose Vision Systems: An End-to-End Task-Agnostic Vision-Language Architecture

Published:2022-06 Issue: Volume: Page:
ISSN:
Container-title:2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
language:
Short-container-title:

Author:

Gupta Tanmay¹,Kamath Amita¹,Kembhavi Aniruddha¹,Hoiem Derek²

Affiliation:

1. PRIOR@Allen Institutefor AI

2. University of Illinois at Urbana-Champaign

Funder

ONR

Publisher

IEEE

Link

http://xplorestaging.ieee.org/ielx7/9878378/9878366/09880087.pdf?arnumber=9880087

Reference59 articles.

1. Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models

2. Neural Baby Talk

3. and Stefan Lee. 12-in-l: Multitask vision and language representation learning;jiasen;2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR),0

4. Vilbert: Pretraining task-agnostic visiolinguistic representations for vision-and-language tasks;lu;NeurIPS,0

Cited by 12 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. DJUHNet: A deep representation learning-based scheme for the task of joint image upsampling and hashing;Signal Processing: Image Communication;2024-11

2. RoboVQA: Multimodal Long-Horizon Reasoning for Robotics;2024 IEEE International Conference on Robotics and Automation (ICRA);2024-05-13

3. Multimodal Foundation Models: From Specialists to General-Purpose Assistants;Foundations and Trends® in Computer Graphics and Vision;2024

4. CPT: Colorful Prompt Tuning for pre-trained vision-language models;AI Open;2024

5. A Survey of Computer Vision Technologies in Urban and Controlled-environment Agriculture;ACM Computing Surveys;2023-11-27