1. Devlin J, Chang M W, Lee K, et al. BERT: pre-training of deep bidirectional transformers for language understanding. arXiv, 2019, 1810.04805.
2. Radford A, Narasimhan K, Salimans T, et al. Improving language understanding by generative pre-training. Available from URL: https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf.
3. Radford A, Kim J W, Hallacy C, et al. Learning transferable visual models from natural language supervision. arXiv, 2021, 2103.00020.
4. Ramesh A, Pavlov M, Goh G, et al. Zero-shot text-to-image generation. arXiv, 2021, 2102.12092.
5. Lin J Y, Men R, Yang A, et al. M6: A Chinese multimodal pretrainer. arXiv, 2021, 2103.00823.