1. Miles: Visual bert pre-training with injected language semantics for video-text retrieval;ge;ArXiv Preprint,2022
2. Zero-shot text-to-image generation;ramesh;International Conference on Machine Learning,0
3. Bert: Pre-training of deep bidirectional transformers for language understanding;devlin;NAACL,2019
4. HOP: History-and-Order Aware Pretraining for Vision-and-Language Navigation
5. Airbert: In-domain Pretraining for Vision-and-Language Navigation