Translating Low-Resource Languages by Vocabulary Adaptation from Close Counterparts

Author:

Passban Peyman1ORCID,Liu Qun2,Way Andy2

Affiliation:

1. Centre, School of Computing, Dublin City University, Ireland

2. ADAPT Centre, School of Computing, Dublin City University, Ireland

Abstract

Some natural languages belong to the same family or share similar syntactic and/or semantic regularities. This property persuades researchers to share computational models across languages and benefit from high-quality models to boost existing low-performance counterparts. In this article, we follow a similar idea, whereby we develop statistical and neural machine translation (MT) engines that are trained on one language pair but are used to translate another language. First we train a reliable model for a high-resource language, and then we exploit cross-lingual similarities and adapt the model to work for a close language with almost zero resources. We chose Turkish (Tr) and Azeri or Azerbaijani (Az) as the proposed pair in our experiments. Azeri suffers from lack of resources as there is almost no bilingual corpus for this language. Via our techniques, we are able to train an engine for the Az → English (En) direction, which is able to outperform all other existing models.

Funder

Science Foundation Ireland at ADAPT: Centre for Digital Content Platform Research

Publisher

Association for Computing Machinery (ACM)

Subject

General Computer Science

Cited by 15 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. AAVE Corpus Generation and Low-Resource Dialect Machine Translation;Proceedings of the 7th ACM SIGCAS/SIGCHI Conference on Computing and Sustainable Societies;2024-07-08

2. Unveiling Sentiments: A Deep Dive Into Sentiment Analysis for Low-Resource Languages—A Case Study on Hausa Texts;IEEE Access;2024

3. eGRUMET: Enhanced Gated Recurrent Unit Machine for English to Kannada lingual Translation;2023 14th International Conference on Computing Communication and Networking Technologies (ICCCNT);2023-07-06

4. Enriching the transfer learning with pre-trained lexicon embedding for low-resource neural machine translation;Tsinghua Science and Technology;2022-02

5. Recent advances of low-resource neural machine translation;Machine Translation;2021-10-30

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3