1. Abdou, M., Glončák, V., and Bojar, O. (2017). “Variable Mini-Batch Sizing and Pre-Trained Embeddings.” In Proceedings of the 2nd Conference on Machine Translation, pp. 680–686, Copenhagen, Denmark. Association for Computational Linguistics.
2. Arora, S., Liang, Y., and Ma, T. (2017). “A Simple but Tough-to-Beat Baseline for Sentence Embeddings.” In the 5th International Conference on Learning Representations.
3. Bojanowski, P., Grave, E., Joulin, A., and Mikolov, T. (2017). “Enriching Word Vectors with Subword Information.” Transactions of the Association for Computational Linguistics, 5, pp. 135–146.
4. Bojar, O., Chatterjee, R., Federmann, C., Graham, Y., Haddow, B., Huang, S., Huck, M., Koehn, P., Liu, Q., Logacheva, V., Monz, C., Negri, M., Post, M., Rubino, R., Specia, L., and Turchi, M. (2017). “Findings of the 2017 Conference on Machine Translation (WMT17).” In Proceedings of the 2nd Conference on Machine Translation, pp. 169–214, Copenhagen, Denmark. Association for Computational Linguistics.
5. Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D., Wu, J., Winter, C., Hesse, C., Chen, M., Sigler, E., Litwin, M., Gray, S., Chess, B., Clark, J., Berner, C., McCandlish, S., Radford, A., Sutskever, I., and Amodei, D. (2020). “Language Models are Few-Shot Learners.” In Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., and Lin, H. (Eds.), Advances in Neural Information Processing Systems, Vol. 33, pp. 1877–1901. Curran Associates, Inc.