Effect of Stemming on Hindi Text Classification-Reference-Cited by-同舟云学术

Effect of Stemming on Hindi Text Classification

Published:2023-02-15 Issue: Volume: Page:
ISSN:0976-5034
Container-title:International Journal of Next-Generation Computing
language:
Short-container-title:ijngc

Author:

Pimpalshende Dr. Anjusha,SINGH PREETY,Potnurwar Dr. Archana

Abstract

Abstract. Text classification is very useful to search large amount of textual data available online by dividing it into smaller relevant units. Now a day’s large amount of digital documents are available in Indian languages. Designing text classifiers in Indian languages is one of the research areas so that people can search and read required documents in their local languages. In proposed work tried to design Text classifier for Hindi text documents and tried to show how stemmer affects the performance of Hindi text classifiers. Stemming is a process to convert words in any language to its base or root words. Stemmers are used for written documents not for spoken languages. Performance of many applications such as text summarization, Information Retrieval (IR) system,text classification systems, syntactic parsing can be improved by applying stemmers. Stemmer eliminates suffix or prefix of the word and form original root word. These root words helps in the preprocessing step required in many algorithms. We applied various stemmers on Hindi text classification models. Experiments and results show that performance of the classifiers is improved by applying stemmers.

Publisher

Perpetual Innovation Media Pvt. Ltd.

Reference8 articles.

1. M. Kasthuri, S. B. R. Kumar and S. Khaddaj, "PLIS: Proposed Language Independent Stemmer for information Retrieval Systems Using Dynamic Programming," 2017 World Congress on Computing and Communication Technologies (WCCCT), Tiruchirappalli, India, 2017, pp. 132-135, doi: 10.1109/WCCCT.2016.39.

2. Vishal Gupta, “Hindi Rule Based Stemmer for Nouns”, International Journal of Advanced Research in Computer Science and Software Engineering, Volume 4, Issue 1, January 2014. .

3. S. Paul, M. Tandon, N. Joshi, I. Mahtur, "Design of a Rule Based Hindi Lemmatizer". In Proceedings of Third International Workshop on Artificial Intelligence, Soft Computing and Applications, Chennai, India, pp 67-74, 2013.

4. AnjushaPimpalshende, A.R. Mahajan “Pre-processing phase of Hindi language text summarization System”. International Journal of Computer Science and Information Security (IJCSIS), Vol. 14, No. 5, May 2016

5. AnjushaPimpalshende AR Mahajan “Extraction of Root Words Using Morphological Analyzer for Hindi Text.”,International Journal of Soft Computing vol 13 (5), pp134-138, June 2019