Affiliation:
1. Department of Computer Science and Engineering, Shahjalal University of Science and Technology, Sylhet, Bangladesh
2. Institute of Information and Communication Technology, Shahjalal University of Science and Technology, Sylhet, Bangladesh
Abstract
Three different Indic/Indo-Aryan languages - Bengali, Hindi and Nepali have been explored here in character level to find out similarities and dissimilarities. Having shared the same root, the Sanskrit, Indic languages bear common characteristics. That is why computer and language scientists can take the opportunity to develop common
Natural Language Processing (NLP)
techniques or algorithms. Bearing the concept in mind, we compare and analyze these three languages character by character. As an application of the hypothesis, we also developed a uniform sorting algorithm in two steps, first for the Bengali and Nepali languages only and then extended it for Hindi in the second step. Our thorough investigation with more than 30,000 words from each language suggests that, the algorithm maintains total accuracy as set by the local language authorities of the respective languages and good efficiency.
Publisher
Association for Computing Machinery (ACM)