1. J. Devlin, M. W. Chang, K. Lee, and K. Toutanova, ‘‘BERT: Pre-training of deep bidirectional transformers for language,’’ in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (2019), Vol. 1, pp. 4171–4186.
2. T. Mikolov et al., ‘‘Distributed representations of words and phrases and their compositionality,’’ arXiv: 1310.4546 (2013).
3. M. Artetxe, G. Labaka, and E. Agirre, ‘‘Learning principled bilingual mappings of word embeddings while preserving monolingual invariance,’’ in Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (2016), pp. 2289–2294.
4. T. Mikolov, Q. V. Le, and I. Sutskever, ‘‘Exploiting similarities among languages for machine translation,’’ arXiv: 1309.4168 (2013).
5. J. Yamane et al., ‘‘Distributional hypernym generation by jointly learning clusters and projections,’’ in Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers (2016), pp. 1871–1879.