1. Attention is all you need;vaswani;Advances in neural information processing systems,2017
2. Regularizing and optimizing LSTM language models;merity;ArXiv Preprint,2017
3. SemEval-2020 Task 4: Commonsense Validation and Explanation
4. Language models are few-shot learners;brown;ArXiv Preprint,2020
5. Bert: Pre-training of deep bidirectional transformers for language understanding;devlin;ArXiv Preprint,2018