1. On Losses for Modern Language Models
2. Christopher Clark , Kenton Lee , Ming-Wei Chang , Tom Kwiatkowski , Michael Collins , and Kristina Toutanova . 2019. BoolQ: Exploring the surprising difficulty of natural yes/no questions. arXiv preprint arXiv:1905.10044 ( 2019 ). Christopher Clark, Kenton Lee, Ming-Wei Chang, Tom Kwiatkowski, Michael Collins, and Kristina Toutanova. 2019. BoolQ: Exploring the surprising difficulty of natural yes/no questions. arXiv preprint arXiv:1905.10044 (2019).
3. Yiming Cui , Wanxiang Che , Ting Liu , Bing Qin , Shijin Wang , and Guoping Hu . 2020 . Revisiting Pre-Trained Models for Chinese Natural Language Processing. In Findings of the Association for Computational Linguistics: EMNLP 2020, Online Event, 16--20 November 2020 (Findings of ACL , Vol. EMNLP 2020), , Trevor Cohn, Yulan He, and Yang Liu (Eds.). Association for Computational Linguistics, 657-- 668 . https://doi.org/10.18653/v1/2020.findings-emnlp. 58 10.18653/v1 Yiming Cui, Wanxiang Che, Ting Liu, Bing Qin, Shijin Wang, and Guoping Hu. 2020. Revisiting Pre-Trained Models for Chinese Natural Language Processing. In Findings of the Association for Computational Linguistics: EMNLP 2020, Online Event, 16--20 November 2020 (Findings of ACL, Vol. EMNLP 2020), , Trevor Cohn, Yulan He, and Yang Liu (Eds.). Association for Computational Linguistics, 657--668. https://doi.org/10.18653/v1/2020.findings-emnlp.58
4. Pre-Training with Whole Word Masking for Chinese BERT;Cui Yiming;IEEE Transactions on Audio, Speech and Language Processing. https://doi.org/10.1109/TASLP.,2021
5. Jacob Devlin , Ming-Wei Chang , Kenton Lee , and Kristina Toutanova . 2019 . BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding . In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019 , Minneapolis, MN, USA, June 2--7 , 2019, Volume 1 (Long and Short Papers), , Jill Burstein, Christy Doran, and Thamar Solorio (Eds.). Association for Computational Linguistics, 4171--4186. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2--7, 2019, Volume 1 (Long and Short Papers), , Jill Burstein, Christy Doran, and Thamar Solorio (Eds.). Association for Computational Linguistics, 4171--4186.