Towards Developing Uniform Lexicon Based Sorting Algorithm for Three Prominent Indo-Aryan Languages-Reference-Cited by-同舟云学术

Towards Developing Uniform Lexicon Based Sorting Algorithm for Three Prominent Indo-Aryan Languages

Published:2022-05-31 Issue:3 Volume:21 Page:1-20
ISSN:2375-4699
Container-title:ACM Transactions on Asian and Low-Resource Language Information Processing
language:en
Short-container-title:ACM Trans. Asian Low-Resour. Lang. Inf. Process.

Author:

Ishraq Mir Ragib¹,Khadka Nitesh¹,Samir Asif Mohammed²,Rahman M. Shahidur¹

Affiliation:

1. Department of Computer Science and Engineering, Shahjalal University of Science and Technology, Sylhet, Bangladesh

2. Institute of Information and Communication Technology, Shahjalal University of Science and Technology, Sylhet, Bangladesh

Abstract

Three different Indic/Indo-Aryan languages - Bengali, Hindi and Nepali have been explored here in character level to find out similarities and dissimilarities. Having shared the same root, the Sanskrit, Indic languages bear common characteristics. That is why computer and language scientists can take the opportunity to develop common Natural Language Processing (NLP) techniques or algorithms. Bearing the concept in mind, we compare and analyze these three languages character by character. As an application of the hypothesis, we also developed a uniform sorting algorithm in two steps, first for the Bengali and Nepali languages only and then extended it for Hindi in the second step. Our thorough investigation with more than 30,000 words from each language suggests that, the algorithm maintains total accuracy as set by the local language authorities of the respective languages and good efficiency.

Publisher

Association for Computing Machinery (ACM)

Subject

General Computer Science

Link

https://dl.acm.org/doi/pdf/10.1145/3488371

Reference43 articles.