Author:
Rawat Sunita,Werulkar Lakshita,Jaywant Sagarika
Abstract
Language Identification is among the crucial steps in any NLP based application. Text - based documents and webpages are rapidly increasing in the modern Internet. It is simple to locate documents written in different languages from all across the world that are available with just one click. Therefore, a language identifier is absolutely necessary in order to help the user interpret the content. Language identification has so far tended to be more concentrated on European languages and is still rather limited for Indian Traditional Languages. Many researchers have become more interested in the study of language identification for similar languages from popular languages. In this paper, Multinomial Na¨ıve Bayes Algorithm is used for detecting languages in Devanagari like Marathi, Sanskrit and Hindi, and three European languages French, Italian and English. An experiment done ondatasets of each language has produced satisfactorily accurate results after training and testing the model.
Publisher
Perpetual Innovation Media Pvt. Ltd.
Reference24 articles.
1. Abbas, M., Ali, K., Memon, S., Jamali, A., Memon, S., and Ahmed, A. 2019. Multinomial na¨ıve bayes classification model for sentiment analysis. IEEE Transactions on Reliability.
2. Cahyani, D. and Patasik, I. 2021. Performance comparison of tf-idf and word2vec models for emotion text classification. Bulletin of Electrical Engineering and Informatics 10, 2780–2788.
3. Cavnar, W. and Trenkle, J. 2001. N-gram-based text categorization. Proceedings of the Third Annual Symposium on Document Analysis and Information Retrieval.
4. Conneau, A., Khandelwal, K., Goyal, N., Chaudhary, V., Wenzek, G., Guzman, F. ´ , Grave, E., Ott, M., Zettlemoyer, L., and Stoyanov, V. 2020. Unsupervised crosslingual representation learning at scale. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online, 8440–8451.
5. Hao, J. and Ho, T. K. 2019. Machine learning made easy: A review of scikit-learn package in python programming language. Journal of Educational and Behavioral Statistics 44, 3, 348–361.