Funder
Sony Faculty Innovation award
Lilly Endowment Inc.
Indiana University Pervasive Technology Institute
NC State University
NSF-AI Institute
Reference81 articles.
1. An Empirical Study of Training End-to-End Vision-and-Language Transformers
2. Omnivl: One foundation model for image-language and video-language tasks;wang;ArXiv Preprint,2022
3. An image is worth 16x16 words: Transformers for image recognition at scale;dosovitskiy;ArXiv Preprint,2020
4. Reconstruction Network for Video Captioning
5. Masked autoencoders as spatiotemporal learners;feichtenhofer;ArXiv Preprint,2022
Cited by
7 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献