1. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding;Devlin,1892
2. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter;Sanh;ArXiv,2019
3. RoBERTa: A Robustly Optimised BERT Pre Training Approach;Liu,2019
4. XLNet: generalised autoregressive pre training for language understanding;Yang,2019
5. Albert: a lite BERT for self-supervised learning of language representations;Lan,2020