Code‐mixed Hindi‐English text correction using fuzzy graph and word embedding-Reference-Cited by-同舟云学术

Code‐mixed Hindi‐English text correction using fuzzy graph and word embedding

Published:2023-05-14 Issue: Volume: Page:
ISSN:0266-4720
Container-title:Expert Systems
language:en
Short-container-title:Expert Systems

Author:

Jain Minni¹,Jindal Rajni¹,Jain Amita²^ORCID

Affiliation:

1. Computer Science and Engineering Delhi Technological University New Delhi India

2. Computer Science and Engineering Netaji Subhas University of Technology Delhi New Delhi India

Abstract

AbstractInteraction via social media involves frequent code‐mixed text, spelling errors and noisy elements, which creates a bottleneck in the performance of natural language processing applications. This proposed work is the first approach for code‐mixed Hindi‐English social media text that comprises language identification, detection and correction of non‐word (Out of Vocabulary) errors as well as real‐word errors occurring simultaneously. Each identified language (Devanagari Hindi, Roman Hindi, and English) has its own complexities and challenges. Errors are detected individually for each language and a suggestive list of the erroneous words is created. After this, a fuzzy graph between different words of the suggestive lists is generated using various semantic relations in Hindi WordNet. Word embeddings and Fuzzy graph‐based centrality measures are used to find the correct word. Several experiments are performed on different social media datasets taken from Instagram, Twitter, YouTube comments, Blogs, and WhatsApp. The experimental results demonstrate that the proposed system corrects out‐of‐vocabulary words as well as real‐word errors with a maximum recall of 0.90 and 0.67, respectively for Dev_Hindi and 0.87 and 0.66, respectively for Rom_Hindi. The proposed method is also applied for state‐of‐art sentiment analysis approaches where the F1‐score has been visibly improved.

Publisher

Wiley

Subject

Artificial Intelligence,Computational Theory and Mathematics,Theoretical Computer Science,Control and Systems Engineering

Link

https://onlinelibrary.wiley.com/doi/pdf/10.1111/exsy.13328

Reference47 articles.

1. Banerjee S. Kuila A. Roy A. Naskar S. K. Rosso P. &Bandyopadhyay S.(2014).A hybrid approach for transliterated word‐level language identification: Crf with post‐processing heuristics. Proceedings of the forum for information retrieval evaluation pp. 54–59.

2. From the Field to the Lab: A Converging Methods Approach to the Study of Codeswitching

3. Becchetti L. &Castillo C.(2006).The distribution of PageRank follows a power‐law only for particular values of the damping factor. Proceedings of the 15th international conference on World Wide Web.

4. Bhat I. A. Mujadia V. Tammewar A. Bhat R. A. &Shrivastava M.(2015).IIIT‐H system submission for FIRE2014 shared task on transliterated search. Proceedings of the forum for information retrieval evaluation FIRE'14 ACM New York NY USA pp. 48–53.https://doi.org/10.1145/2824864.2824872

5. Some remarks on fuzzy graphs

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Special issue on International conference on computing and communication networks (ICCCN2022);Expert Systems;2023-11-02