1. Alqahtani, S., Lalwani, G., Zhang, Y., Romeo, S., Mansour, S.: Using optimal transport as alignment objective for fine-tuning multilingual contextualized embeddings. In: EMNLP (2021)
2. Chen, L., et al.: Improving sequence-to-sequence learning via optimal transport. In: ICLR, pp. 1–16 (2019)
3. Chen, Y.C., Gan, Z., Cheng, Y., Liu, J., Liu, J.: Distilling knowledge learned in bert for text generation. In: ACL (2020)
4. Cuturi, M.: Sinkhorn distances: lightspeed computation of optimal transport. In: NIPS, pp. 2292–2300 (2013)
5. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. In: NAACL (2019)