1. Cui, Y., et al.: Pre-training with whole word masking for Chinese BERT, pp. 11–21. arXiv preprint arXiv:1906.08101 (2019)
2. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding, pp. 1–16. arXiv preprint arXiv:1810.04805 (2018)
3. Lecture Notes in Computer Science (Lecture Notes in Artificial Intelligence);B Paranjape,2018
4. Lecture Notes in Computer Science (Lecture Notes in Artificial Intelligence);OC Santos,2013
5. Sun, Y., et al.: ERNIE: enhanced representation through knowledge integration, pp. 1–8. arXiv preprint arXiv:1904.09223 (2019)