Malayalam Natural Language Processing: Challenges in Building a Phrase-Based Statistical Machine Translation System

Author:

Sebastian Mary Priya1ORCID,G. Santhosh Kumar2ORCID

Affiliation:

1. Department of Computer Science, Rajagiri School of Engineering and Technology Kerala, Rajagiri Valley, Kakkanad, Kochi, Kerala, India

2. Natural Language Processing Lab, Department of Computer Science, Cochin University of Science and Technology, Kalamasserry, Kochi, Kerala, India

Abstract

Statistical Machine Translation (SMT) is a preferred Machine Translation approach to convert the text in a specific language into another by automatically learning translations using a parallel corpus. SMT has been successful in producing quality translations in many foreign languages, but there are only a few works attempted in South Indian languages. The article discusses on experiments conducted with SMT for Malayalam language and analyzes how the methods defined for SMT in foreign languages affect a Dravidian language, Malayalam. The baseline SMT model does not work for Malayalam due to its unique characteristics like agglutinative nature and morphological richness. Hence, the challenge is to identify where precisely the SMT model has to be modified such that it adapts the challenges of the language peculiarity into the baseline model and give better translations for English to Malayalam translation. The alignments between English and Malayalam sentence pairs, subjected to the training process in SMT, plays a crucial role in producing quality output translation. Therefore, this work focuses on improving the translation model of SMT by refining the alignments between English–Malayalam sentence pairs. The phrase alignment algorithms align the verb and noun phrases in the sentence pairs and develop a new set of alignments for the English–Malayalam sentence pairs. These alignment sets refine the alignments formed from Giza++ produced as a result of EM training algorithm. The improved Phrase-Based SMT model trained using these refined alignments resulted in better translation quality, as indicated by the AER and BLUE scores.

Publisher

Association for Computing Machinery (ACM)

Subject

General Computer Science

Reference89 articles.

1. Machine translation from English to Malayalam using transfer approach

2. L. Ahrenberg, A. Mikael, and M. Magnus. 1998. A simple hybrid aligner for generating lexical correspondences in parallel texts. 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, Vol. 1.

3. A. Ahsan, P. Kolachina, S. Kolachina, D. M. Sharma, and R. Sangal. 2010. Coupling statistical machine translation with rule-based transfer and generation. In Proceedings of the 9th Conference of the Association for Machine Translation in the Americas.

4. Development of parallel corpus and english to urdu statistical machine translation;Ali A.;Int. J. of Engineering & Technology IJET-IJENS,2010

5. M. Anand Kumar. 2013. Morphology based prototype statistical machine translation system for English to Tamil language. PhD thesis Amrita Vishwa Vidyapeetham Coimbatore.

Cited by 4 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Research on Automatic Scoring Method of Intelligent Translation System Based on TSO Optimized LSTM Networks;ICST Transactions on Scalable Information Systems;2024-02-20

2. Design of Machine Automatic Translation System Based on Artificial Intelligence;2023 International Conference on Ambient Intelligence, Knowledge Informatics and Industrial Electronics (AIKIIE);2023-11-02

3. Machine Translation Systems Based on Classical-Statistical-Deep-Learning Approaches;Electronics;2023-04-04

4. Korean-Chinese Machine Translation Method Based on Independent Language Features;Communications in Computer and Information Science;2023

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3