1. Attention is all you need;Vaswani,2017
2. BERT: Pre-training of deep bidirectional transformers for language understanding;Devlin,2019
3. Language models are unsupervised multitask learners;Radford,2019
4. BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension;Lewis,2020
5. DialogBERT: Discourse-aware response generation via learning to recover and rank utterances;Gu,2021