1. Scaling neural machine translation;Ott,2018
2. Attention is all you need;Vaswani,2017
3. Pre-trained models for natural language processing: A survey;Qiu;Sci. China Technol. Sci.,2020
4. BERT: Pre-training of deep bidirectional transformers for language understanding;Devlin,2019
5. Language models are few-shot learners;Brown,2020