Predicting Marathi News Class Using Semantic Entity-Driven Clustering Approach
Author:
Saini Jatinderkumar R.1ORCID,
Bafna Prafulla Bharat2
Affiliation:
1. Symbiosis Institute of Computer Studies and Research, Symbiosis International University (Deemed), India
2. Symbiosis International University (Deemed), India
Abstract
Document management is a need for an era and managing documents in the regional languages is a significant and untouched area. Marathi corpus consisting of news is processed to form Group Entity document matrix Marathi (GEDMM), Vector space model for Marathi (VSMM) and Hysynset Vector space model for Marathi (HSVSMM). GEDMM uses entity group extracted using Condition random field (CRF). The frequent terms are used to construct VSMM using TF-IDF. HSVSMM uses synsets using hypernyms-hyponyms and synonyms. GEDMM and HSVSMM use dimension reduction by selecting significant feature groups. Hierarchical agglomerative clustering (HAC) is used and a dendrogram is produced to visualize the clusters. The performance analysis is carried out using several parameters like entropy, purity, misclassification error and accuracy. The clusters produced using GEDMM shows the minimum entropy and the highest purity. A random forest classifier is applied and the results are evaluated using misclassification error and accuracy.
Subject
Information Systems and Management,Strategy and Management,Computer Science Applications,Information Systems
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. A Novel Soft Voting Based Hybrid Approach to Detect Fake News in Hindi;2022 International Conference on Futuristic Technologies (INCOFT);2022-11-25