Word Sense Disambiguation applied to Assamese-Hindi Bilingual Statistical Machine Translation
-
Published:2024-02-08
Issue:1
Volume:14
Page:12581-12586
-
ISSN:1792-8036
-
Container-title:Engineering, Technology & Applied Science Research
-
language:
-
Short-container-title:Eng. Technol. Appl. Sci. Res.
Author:
Barman Anup Kumar,Sarmah Jumi,Basimatary Subungshri,Nag Amitava
Abstract
Word Sense Disambiguation (WSD) is concerned with automatically assigning the appropriate sense to an ambiguous word. WSD is an important task and plays a crucial role in many Natural Language Processing (NLP) applications. A Statistical Machine Translation (SMT) system translates a source into a target language based on phrase-based statistical translation. MT plays a crucial role in a WSD system, as a source language word may be associated with multiple translations in the target language. This study aims to apply WSD to the input of the MT system to enhance the disambiguation output. Hindi WordNet was used by selecting the most frequent synonym to obtain the most accurate translation. This study also compared Naïve Bayes (NB) and Decision Tree (DT) to test and build a WSD model. NB was more appropriate for the WSD task than DT when evaluated in the Weka machine learning toolkit. To the best of our knowledge, no such work has been carried out yet for the Assamese Indo-Aryan language. The applied WSD achieved better results than the baseline MT system without embedding the WSD module. The results were analyzed by linguist scholars. Furthermore, the Assamese-Hindi transliteration system was merged with the baseline MT system for the translation of proper nouns. This study marks a remarkable contribution to Assamese NLP, which is a low computationally aware Indian language.
Publisher
Engineering, Technology & Applied Science Research
Reference16 articles.
1. R. Joshi, R. Karnavat, K. Jirapure, and R. Joshi, "Evaluation of Deep Learning Models for Hostility Detection in Hindi Text," in 2021 6th International Conference for Convergence in Technology (I2CT), Maharashtra, India, Apr. 2021, pp. 1–5. 2. A. Kumari and D. K. Lobiyal, "Efficient estimation of Hindi WSD with distributed word representation in vector space," Journal of King Saud University - Computer and Information Sciences, vol. 34, no. 8, Part B, pp. 6092–6103, Sep. 2022. 3. M. Sheth, S. Popat, and T. Vyas, "Word Sense Disambiguation for Indian Languages," in Emerging Research in Computing, Information, Communication and Applications, 2018, pp. 583–593. 4. R. L. Singh, K. Ghosh, K. Nongmeikapam, and S. Bandyopadhyay, "A Decision Tree Based Word Sense Disambiguation System in Manipuri Language," Advanced Computing: An International Journal, vol. 5, no. 4, pp. 17–22, Jul. 2014. 5. S. K. Sarma, H. Bharali, A. Gogoi, R. Deka, and A. K. Barman, "A Structured Approach for Building Assamese Corpus: Insights, Applications and Challenges," in Proceedings of the 10th Workshop on Asian Language Resources, Mumbai, India, Dec. 2012, pp. 21–28.
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Stance Detection in Hinglish Data using the BART-large-MNLI Integration Model;Engineering, Technology & Applied Science Research;2024-08-02 2. Leveraging Bilingual Dictionaries for Improved Setswana-English Machine Translation: A Context-Aware Model;2024 International Conference on Artificial Intelligence, Big Data, Computing and Data Communication Systems (icABCD);2024-08-01
|
|