Contrastive Adversarial Training for Multi-Modal Machine Translation-Reference-Cited by-同舟云学术

Contrastive Adversarial Training for Multi-Modal Machine Translation

Published:2023-06-16 Issue:6 Volume:22 Page:1-18
ISSN:2375-4699
Container-title:ACM Transactions on Asian and Low-Resource Language Information Processing
language:en
Short-container-title:ACM Trans. Asian Low-Resour. Lang. Inf. Process.

Author:

Huang Xin¹^ORCID,Zhang Jiajun¹^ORCID,Zong Chengqing¹^ORCID

Affiliation:

1. University of Chinese Academy of Sciences

Abstract

The multi-modal machine translation task is to improve translation quality with the help of additional visual input. It is expected to disambiguate or complement semantics while there are ambiguous words or incomplete expressions in the sentences. Existing methods have tried many ways to fuse visual information into text representations. However, only a minority of sentences need extra visual information as complementary. Without guidance, models tend to learn text-only translation from the major well-aligned translation pairs. In this article, we propose a contrastive adversarial training approach to enhance visual participation in semantic representation learning. By contrasting multi-modal input with the adversarial samples, the model learns to identify the most informed sample that is coupled with a congruent image and several visual objects extracted from it. This approach can prevent the visual information from being ignored and further fuse cross-modal information. We examine our method in three multi-modal language pairs. Experimental results show that our model is capable of improving translation accuracy. Further analysis shows that our model is more sensitive to visual information.

Publisher

Association for Computing Machinery (ACM)

Subject

General Computer Science

Link

https://dl.acm.org/doi/pdf/10.1145/3587267

Reference53 articles.

1. Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. In Proceedings of the 3rd International Conference on Learning Representations (ICLR’15). http://arxiv.org/abs/1409.0473.

2. Findings of the Third Shared Task on Multimodal Machine Translation

3. Yonatan Belinkov and Yonatan Bisk. 2018. Synthetic and natural noise both break neural machine translation. In Proceedings of the 6th International Conference on Learning Representations (ICLR’18). https://openreview.net/forum?id=BJ8vJebC-.

4. Multimodal attention for neural machine translation;Caglayan Ozan;CoRR,2016

5. Probing the Need for Visual Context in Multimodal Machine Translation

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. DAS-CL: Towards Multimodal Machine Translation via Dual-Level Asymmetric Contrastive Learning;Proceedings of the 32nd ACM International Conference on Information and Knowledge Management;2023-10-21