1. Language models are few-shot learners;Brown;Advances in neural information processing systems,2020
2. Bert: Pre-training of deep bidirectional transformers for language understanding;Devlin,2018
3. Learning transferable visual models from natural language supervision;Radford
4. ImageBind One Embedding Space to Bind Them All
5. Masked Autoencoders Are Scalable Vision Learners