An ensemble method for cross lingual semantic textual similarity-Reference-Cited by-同舟云学术

An ensemble method for cross lingual semantic textual similarity

Published:2023-09-06 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Piroozfar Poorya¹,Abdous Mohammad¹,Bidgoli Behrouz Minaei¹

Affiliation:

1. Iran University of Science and Technology

Abstract

Abstract Today, it is particularly important to recognize the semantic similarity between texts in different languages due to the emergence of new natural language processing models like ChatGPT and Bard. These models can provide more accurate and comprehensive answers to users' questions by identifying semantic similarity between two texts in different languages. Cross-lingual semantic similarity refers to the process of calculating similarity between two pieces of text in different languages. This paper aims to present an improved method for finding similarities between sentences in different languages. Some of the current methods create the same vector space to achieve this, while others use machine translation to translate the text into another language and then determine similarity between the two sentences using monolingual sentence similarity methods. The degree of similarity is expressed as a number between 0 and 5. Over the past few years, the progress in language models based on transformers has paved the way for improvements in detecting text similarity. This article discusses the utilization of ensemble models with transformers to determine the semantic similarity of sentences in Persian and English languages utilizing the Persian-English corpus. According to our findings, this ensemble approach has a correlation rate of 95.28% in detecting the extent of semantic similarity between cross-lingual sentences. These results indicate that our method surpasses previous techniques for discovering similarities between sentences in different languages.

Publisher

Research Square Platform LLC

Reference21 articles.

1. Q. Le and T. Mikolov, “Distributed representations of sentences and documents,” in International conference on machine learning, (2014) pp. 1188–1196

2. PESTS:Persian_English Corpus for Cross Language Semantic Textual Similarity;Abdous M;arXiv Prepr arXiv

3. Semantic textual similarity methods, tools, and applications: A survey;Majumder G;Comput y Sist,2016

4. Limbasiya N, Agrawal P (2019) “Semantic Textual Similarity and Factorization Machine Model for Retrieval of Question-Answering,” in International Conference on Advances in Computing and Data Sciences, pp. 195–206

5. Comelles E, Atserias J (2015) “VERTa: A linguistically-motivated metric at the WMT15 metrics task,” in Proceedings of the Tenth Workshop on Statistical Machine Translation, pp. 366–372