Identifying and Analyzing Reduplication Multiword Expressions in Hindi Text Using Machine Learning-Reference-Cited by-同舟云学术

Identifying and Analyzing Reduplication Multiword Expressions in Hindi Text Using Machine Learning

Published:2023-08-28 Issue: Volume: Page:1732-1741
ISSN:2217-8333
Container-title:TEM Journal
language:en
Short-container-title:

Author:

Mishra Atul¹,Mishra Alok²

Affiliation:

1. BML Munjal University, India

2. Faculty of Engineering, NTNU-Norwegian University of Science and Technology, Norway

Abstract

The task of identifying and analyzing Reduplication Multiword Expressions (RMWEs) in Natural Language Processing (NLP) involves extracting repeated words from various text forms and classifying them into Onomatopoeic, non-Onomatopoeic, partial, or semantic types. With the increasing use of low-resource languages in news, opinions, comments, hashtags, reviews, posts, and journals, this study proposes a machine learning-based RMWE identification method for Hindi text. The method employs linguistic patterns and statistical data, along with a proposed threshold boundary detection in statistical filtering. The Jaccard distance of dissimilarity and Sorensen Dice Coefficient of Similarity are used for semantic relation analysis. The proposed approach was evaluated using the publicly available Hindi corpus from IITB, measuring performance between two consecutive thresholds with the lowest error and highest recall. This study proposes an effective method for Indian computational linguistics, with experimental results highlighting its viability and utility, and providing a blueprint for current procedures.

Publisher

Association for Information Communication Technology Education and Science (UIKTEN)

Subject

Management of Technology and Innovation,Information Systems and Management,Strategy and Management,Education,Information Systems,Computer Science (miscellaneous)

Link

https://www.temjournal.com/content/123/TEMJournalAugust2023_1732_1741.pdf

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Bengali reduplication generation with finite-state transducers (FSTs);International Journal of Speech Technology;2024-08-05

2. An Approach to Identify the Complete Reduplicated Multiword Expressions in Digital Bengali Text;Journal of The Institution of Engineers (India): Series B;2024-07-15