An ensemble method for cross lingual semantic textual similarity

Author:

Piroozfar Poorya1,Abdous Mohammad1,Bidgoli Behrouz Minaei1

Affiliation:

1. Iran University of Science and Technology

Abstract

Abstract Today, it is particularly important to recognize the semantic similarity between texts in different languages due to the emergence of new natural language processing models like ChatGPT and Bard. These models can provide more accurate and comprehensive answers to users' questions by identifying semantic similarity between two texts in different languages. Cross-lingual semantic similarity refers to the process of calculating similarity between two pieces of text in different languages. This paper aims to present an improved method for finding similarities between sentences in different languages. Some of the current methods create the same vector space to achieve this, while others use machine translation to translate the text into another language and then determine similarity between the two sentences using monolingual sentence similarity methods. The degree of similarity is expressed as a number between 0 and 5. Over the past few years, the progress in language models based on transformers has paved the way for improvements in detecting text similarity. This article discusses the utilization of ensemble models with transformers to determine the semantic similarity of sentences in Persian and English languages utilizing the Persian-English corpus. According to our findings, this ensemble approach has a correlation rate of 95.28% in detecting the extent of semantic similarity between cross-lingual sentences. These results indicate that our method surpasses previous techniques for discovering similarities between sentences in different languages.

Publisher

Research Square Platform LLC

Reference21 articles.

1. Q. Le and T. Mikolov, “Distributed representations of sentences and documents,” in International conference on machine learning, (2014) pp. 1188–1196

2. PESTS:Persian_English Corpus for Cross Language Semantic Textual Similarity;Abdous M;arXiv Prepr arXiv

3. Semantic textual similarity methods, tools, and applications: A survey;Majumder G;Comput y Sist,2016

4. Limbasiya N, Agrawal P (2019) “Semantic Textual Similarity and Factorization Machine Model for Retrieval of Question-Answering,” in International Conference on Advances in Computing and Data Sciences, pp. 195–206

5. Comelles E, Atserias J (2015) “VERTa: A linguistically-motivated metric at the WMT15 metrics task,” in Proceedings of the Tenth Workshop on Statistical Machine Translation, pp. 366–372

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3