Cross-lingual text similarity exploiting neural machine translation models-Reference-Cited by-同舟云学术

Cross-lingual text similarity exploiting neural machine translation models

Published:2020-03-18 Issue:3 Volume:47 Page:404-418
ISSN:0165-5515
Container-title:Journal of Information Science
language:en
Short-container-title:Journal of Information Science

Author:

Seki Kazuhiro¹^ORCID

Affiliation:

1. Konan University, Japan

Abstract

This article studies cross-lingual text similarity using neural machine translation models. A straightforward approach based on machine translation is to use translated text so as to make the problem monolingual. Another possible approach is to use intermediate states of machine translation models as recently proposed in the related work, which could avoid propagation of translation errors. We aim at improving both approaches independently and then combine the two types of information, that is, translations and intermediate states, in a learning-to-rank framework to compute cross-lingual text similarity. To evaluate the effectiveness and generalisability of our approach, we conduct empirical experiments on English–Japanese and English–Hindi translation corpora for a cross-lingual sentence retrieval task. It is demonstrated that our approach using translations and intermediate states outperforms other neural network–based approaches and is even comparable with a strong baseline based on a state-of-the-art machine translation system.

Funder

Japan Science and Technology Agency

Ministry of Education, Culture, Sports, Science and Technology

Publisher

SAGE Publications

Subject

Library and Information Sciences,Information Systems

Link

http://journals.sagepub.com/doi/pdf/10.1177/0165551520912676

Reference10 articles.

1. Multilingual Information Retrieval

2. On Cross-Lingual Text Similarity Using Neural Translation Models