Enhancing Machine Translation Models through Optimal Strategies for Prior Knowledge Integration: A Systematic Review

Author:

Asebel Muluken Hussen1,Assefa Shimelis Getu2,Haile Mesfin Abebe1

Affiliation:

1. Adama Science and Technology University

2. University of Denver

Abstract

Abstract

Machine Translation (MT) has significantly transformed cross-language communication, becoming a pivotal field in Natural Language Processing (NLP). This systematic literature review and bibliometric analysis explore the optimal strategies for integrating prior knowledge into machine translation models. We delve into the strategic aspects of incorporating linguistic proficiency, external semantic information, and attention mechanisms, aiming to answer the research question: "What is the best strategy for integrating prior knowledge into machine translation?" The study employs a methodically crafted search strategy, focusing on Scopus and IEEE Xplore databases from 2014 to 2024. The rigorous selection process, following PRISMA guidelines, results in 22 articles for comprehensive analysis. The literature review highlights the significance of incorporating linguistic proficiency, semantic understanding, and domain-specific knowledge into machine translation models. Scholars consistently advocate for leveraging both syntactic and semantic structures from the source language to enhance model performance. The analysis of 22 selected articles presents a variety of strategies, including node random dropping, shared vocabulary and cognate lexicon induction, word-level domain-sensitive representation learning, and knowledge distillation. Notably, the "Node Random Dropping Strategy" stands out for capturing both semantic and syntactic representations, while the "Shared Vocabulary and Cognate Lexicon Induction" strategy proves effective for languages with shared linguistic features. In conclusion, this systematic review identifies a consensus among scholars on the importance of integrating syntactic and semantic language structures as crucial forms of prior knowledge. To effectively implement these structures, the study recommends employing a graph neural network from the encoder side. This comprehensive analysis contributes valuable insights for future research in optimizing strategies for prior knowledge integration into machine translation models.

Publisher

Research Square Platform LLC

Reference22 articles.

1. Improvement of English-Hindi machine translation using ConceptNet;Bansal M;2017 Recent Developments in Control Automation & Power Engineering (RDCAPE),2017

2. Barzegar, S., Davis, B., Handschuh, S., & Freitas, A. (2018). Multilingual semantic relatedness using lightweight machine translation. 2018 IEEE 12th International Conference on Semantic Computing (ICSC), 108–114.

3. Berrichi, S., & Mazroui, A. (2020). Enhancing machine translation by integrating linguistic knowledge in the word alignment module. 2020 International Conference on Intelligent Systems and Computer Vision (ISCV), 1–6.

4. Bugliarello, E., & Okazaki, N. (2019). Enhancing machine translation with dependency-aware self-attention. ArXiv Preprint ArXiv:1909.03149.

5. Linguistic knowledge-based vocabularies for Neural Machine Translation;Casas N;Natural Language Engineering,2021

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3