1. Wang, B. , Wang, S. , Cheng, Y. , Gan, Z. , Jia, R. , Li, B. and Liu, J. (2020). Infobert: improving robustness of language models from an information theoretic perspective, arXiv preprint arXiv: 2010.02329.
2. Liu, Y. , Ott, M. , Goyal, N. , Du, J. , Joshi, M. , Chen, D. ,…, Stoyanov, V. (2019). Roberta: A robustly optimized bert pretraining approach, arXiv preprint arXiv: 1907.
3. FLAT: Chinese NER Using Flat-Lattice Transformer
4. DocNLI: A Large-scale Dataset for Document-level Natural Language Inference