VindLU: A Recipe for Effective Video-and-Language Pretraining-Reference-Cited by-同舟云学术

VindLU: A Recipe for Effective Video-and-Language Pretraining

Published:2023-06 Issue: Volume: Page:
ISSN:
Container-title:2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
language:
Short-container-title:

Author:

Cheng Feng¹,Wang Xizi²,Lei Jie¹,Crandall David²,Bansal Mohit¹,Bertasius Gedas¹

Affiliation:

1. UNC Chapel Hill

2. Indiana University

Funder

Sony Faculty Innovation award

Lilly Endowment Inc.

Indiana University Pervasive Technology Institute

NC State University

NSF-AI Institute

Publisher

IEEE

Link

Reference81 articles.

2. Omnivl: One foundation model for image-language and video-language tasks;wang;ArXiv Preprint,2022

3. An image is worth 16x16 words: Transformers for image recognition at scale;dosovitskiy;ArXiv Preprint,2020

5. Masked autoencoders as spatiotemporal learners;feichtenhofer;ArXiv Preprint,2022

Cited by 7 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

4. Zero-Shot and Few-Shot Video Question Answering with Multi-Modal Prompts;2023 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW);2023-10-02

5. eP-ALM: Efficient Perceptual Augmentation of Language Models;2023 IEEE/CVF International Conference on Computer Vision (ICCV);2023-10-01