1. Glove: Global Vectors for Word Representation
2. Scaling up visual and vision-language representation learning with noisy text supervision;jia;ArXiv Preprint,2021
3. Efficient Estimation of Word Representations in Vector Space;mikolov;ArXiv,2013
4. UNITER: UNiversal Image-TExt Representation Learning
5. Unified Vision-Language Pre-Training for Image Captioning and VQA;lu;Proceedings of the 28th ACM International Conference on Multimedia,2020