Neural Machine Translation and Post-editing: Improving translation of domain specific English terms for a low resource language

Author:

Asadi Vahid1,Shokpour Nasrin2,O'Neill Shirley3,Dann Christopher3,Wang Jenny3

Affiliation:

1. Dublin City University

2. Shiraz University of Medical Sciences

3. University of Southern Queensland

Abstract

Abstract Neural Machine Translation (NMT) has demonstrated salient enhancement and quality output in various aspects, but issues remain when translating specific terms across domains when data are heterogeneous or dealing with rare phrases. To explore this, the translation system for the language pair: English and Persian, was considered, since low resource languages like Persian require more effort and attention to provide accurate translation when dealing with texts in specific domains. An aligned parallel terminology database was created in the Information Technology domain, where all terms were manually annotated. Data were selected from a collection of parallel sentences from the OPUS repository. Terms in the source and the target language were identified and annotated manually. After training an NMT model, the system was manually evaluated on the effectiveness of the application of instance selection to retrieve sentences. Its use led to an overall enhancement in post-editing, thereby achieving a better translation. Besides gaining insights into errors and system performance, this method provides guidance for setting priorities for future extensions and improvements for machine translation. It also sheds light on the use of instance selection across different scenarios in relation to low resource languages like Persian to achieve an overall high quality term translation accuracy.

Publisher

Research Square Platform LLC

Reference33 articles.

1. Ahmadnia, B., Serrano, J., & Haffari, G. (2017). Persian-Spanish low-resource statistical machine translation through English as pivot language. In: Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP), Sept 4–6. Varna, Bulgaria, pp. 24–30.

2. Arčan, M., Turchi, M., & Buitelaar, P. (2015). Knowledge portability with semantic expansion of ontology labels. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing. Beijing, China, July 26–31, pp. 708–718 https://doi:10.3115/v1/P15-1069.

3. On the evaluation of adaptive machine translation for human post-editing;Bentivogli L;IEEE Transactions on Audio Speech and Language Processing (TASLP),2016

4. Bergmanis, T., & Pinnis, M. (2021). Facilitating terminology translation with target lemma annotations. In: Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pp. 3105–3111, Online. Association for Computational Linguistics (ACL).

5. Castilho, S., Doherty, S., Gaspari, F., & Moorkens, J. (2018). Evaluating the impact of light post-editing on usability. In: Proceedings of the tenth international conference on language resources and evaluation, Portorož, 23–28 May 2016, pp. 310–316.

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3