1. Bert: Pre-training of deep bidirectional transformers for language understanding;Devlin,2018
2. Albert: A lite bert for self-supervised learning of language representations;Lan
3. Roberta: A robustly optimized bert pretraining approach;Liu,2019
4. Xlnet: Generalized autoregressive pretraining for language understanding;Yang
5. Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter;Sanh,2019