1. Bengio, Y., Louradour, J., Collobert, R., Weston, J.: Curriculum learning. In: Danyluk, A.P., Bottou, L., Littman, M.L. (eds.) Proceedings of the 26th Annual International Conference on Machine Learning, ICML 2009, Montreal, Quebec, Canada, June 14–18, 2009. ACM International Conference Proceeding Series, vol. 382, pp. 41–48. ACM (2009). https://doi.org/10.1145/1553374.1553380
2. Chorowski, J., Jaitly, N.: Towards better decoding and language model integration in sequence to sequence models. In: INTERSPEECH (2017)
3. Clark, K., Luong, M., Le, Q.V., Manning, C.D.: ELECTRA: pre-training text encoders as discriminators rather than generators. In: 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, 26–30 April 2020. OpenReview.net (2020). https://openreview.net/forum?id=r1xMH1BtvB
4. Fang, Y., Sun, S., Gan, Z., Pillai, R., Wang, S., Liu, J.: Hierarchical graph network for multi-hop question answering. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 8823–8838. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.emnlp-main.710
5. Gao, Y., Wang, W., Herold, C., Yang, Z., Ney, H.: Towards a better understanding of label smoothing in neural machine translation. In: Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing, pp. 212–223. Association for Computational Linguistics, Suzhou, China (2020). https://aclanthology.org/2020.aacl-main.25