1. Devlin, J. , Chang, M.-W. , Lee, K. & Toutanova, K . BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. Preprint at http://arxiv.org/abs/1810.04805 (2019).
2. Radford, A. , Narasimhan, K. , Salimans, T. & Sutskever, I. Improving Language Understanding by Generative Pre-Training.
3. Sun, C. , Qiu, X. , Xu, Y. & Huang, X . How to Fine-Tune BERT for Text Classification? Preprint at http://arxiv.org/abs/1905.05583 (2020).
4. Xu, H. , Liu, B. , Shu, L. & Yu, P. S . BERT Post-Training for Review Reading Comprehension and Aspect-based Sentiment Analysis. Preprint at http://arxiv.org/abs/1904.02232 (2019).
5. Dathathri, S. et al. Plug and Play Language Models: A Simple Approach to Controlled Text Generation. Preprint at http://arxiv.org/abs/1912.02164 (2020).