1. Devlin, J., Chang, M., Lee, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 4171–4186. Association for Computational Linguistics, Minneapolis, Minnesota (2019)
2. Zhuang, L., Ya, S., Wayne, L.: A robustly optimized BERT pre-training approach with post-training. In: Proceedings of the 20th Chinese National Conference on Computational Linguistics, pp. 1218–1227. Chinese Information Processing Society of China, Huhhot, China (2021)
3. Zhang, H.Y., Cisse, M.: mixup: beyond empirical risk minimization. In: International Conference on Learning Representations (2018)
4. Kim, Y.: Convolutional neural networks for sentence classification. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, pp. 1746–1751S. Association for Computational Linguistics, Doha, Qatar (2014)
5. Zhang, Y., Liu, Q.: Sentence-state LSTM for text representation. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, vol. 1, Long Papers, pp. 317–327. Association for Computational Linguistics, Melbourne, Australia (2018)