1. Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P., Soricut, R.: ALBERT:A lite bert for self-supervised learning of language representations. In: International Conference on Learning Representations (2020)
2. Devlin, J., Chang, M. W., Lee, K., Toutanova, K.: 2019. BERT: pre-training of deep bidirectional transformers for language understanding. In: NAACL-HLT (1) (2019)
3. Dhamala, J., Sun, T., Kumar, V., Krishna, S., Pruksachatkun, Y., Chang, K. W., & Gupta, R.: BOLD: dataset and metrics for measuring biases in open-ended language generation. In: Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency (2021)
4. Wang, A., Cho, K.: BERT has a mouth, and it must speak: BERT as a Markov Random field language model. In: Proceedings of the Workshop on Methods for Optimizing and Evaluating Neural Language Generation, pp, 30–36 (2019)
5. Bender, E.M., Friedman, B.: Data statements for natural language processing: toward mitigating system bias and enabling better science. Trans. Assoc. Comput. Linguist. (TACL) 6, 587–604 (2018)