1. BERT: Pre-training of Deep Bidirectional Trans-formers for Language Understanding;Devlin;arXiv preprint,2018
2. An Image is Worth 16×16 Words: Transform-ers for Image Recognition at Scale;Dosovitskiy;arXiv preprint,2020
3. Language Models are Unsupervised Multitask Learners;Radford;OpenAI blog,2019
4. LLaMA: Open and Efficient Foundation Lan-guage Models;Touvron;arXiv preprint,2023
5. GPT-4 technical report;Achiam;arXiv preprint,2023