1. Ammar, W., Mulcaire, G., Tsvetkov, Y., Lample, G., Dyer, C., and Smith, N. A. (2016). “Massively Multilingual Word Embeddings.” arXiv preprint arXiv:1602.01925.
2. Bojanowski, P., Grave, E., Joulin, A., and Mikolov, T. (2017). “Enriching Word Vectors with Subword Information.” Transactions of the Association for Computational Linguistics, 5, pp. 135–146.
3. Botha, J. A., Pitler, E., Ma, J., Bakalov, A., Salcianu, A., Weiss, D., McDonald, R. T., and Petrov, S. (2017). “Natural Language Processing with Small Feed-Forward Networks.” In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 2879–2885.
4. Brown, P. F., Desouza, P. V., Mercer, R. L., Pietra, V. J. D., and Lai, J. C. (1992). “Class-based n-gram Models of Natural Language.” Computational Linguistics, 18 (4), pp. 467–480.
5. Bruni, E., Boleda, G., Baroni, M., and Tran, N.-K. (2012). “Distributional Semantics in Technicolor.” In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers-Volume 1, pp. 136–145. Association for Computational Linguistics.