1. Bert: Pre-training of deep bidirectional transformers for language understanding;Devlin;arXiv preprint,2018
2. Language models are few-shot learners;Brown;Advances in neural information processing systems,2020
3. Using deepspeed and megatron to train megatron-turing nlg 530b, a large-scale generative language model;Smith;arXiv preprint,2022
4. Florence: A new foundation model for computer vision;Yuan;arXiv preprint,2021
5. Vl-bert: Pre-training of generic visual-linguistic representations;Su;arXiv preprint,2019