1. Altschuler, J., Weed, J., Rigollet, P.: Near-linear time approximation algorithms for optimal transport via sinkhorn iteration. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, pp. 1961–1971 (2017)
2. Berzak, Y., Malmaud, J., Levy, R.: STARC: structured annotations for reading comprehension. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 5726–5735 (2020)
3. Cuturi, M.: Sinkhorn distances: lightspeed computation of optimal transport. In: Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume 2, pp. 2292–2300 (2013)
4. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 4171–4186 (2019)
5. Huang, Z., Yu, P., Allan, J.: Improving cross-lingual information retrieval on low-resource languages via optimal transport distillation. In: Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining, pp. 1048–1056 (2023)