Semantic Term weighting representation for Kannada Document Classification-Reference-Cited by-同舟云学术

Semantic Term weighting representation for Kannada Document Classification

Published:2023-01-18 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Rangan R Kasturi¹,Harish B S²

Affiliation:

1. Vidyavardhaka College of Engineering

2. JSS Science & Technology University

Abstract

Abstract In natural language processing, sequence order of terms plays a vital role. This positional sequence information helps in the semantic analysis of the natural language. The absence of semantic information in term weighting methods motivated us to propose the semantic term weighting representation. On the other hand, to address the demand for Indian regional language resources, especially for the Kannada language we have created an 11,045 Kannada documents dataset. This dataset is multilabel and unbalanced. The proposed semantic term weighting representation methods (Term Frequency-Positional encoding (TF-PE) and Term Frequency-Inverse document frequency-Positional encoding (TF-IDF-PE)) are applied to the proposed dataset. Further, the K-Fold and normal train-test split experimentations are carried out on the proposed dataset. Among the proposed representation methods Unicode encoded Term Frequency-Inverse document frequency-Positional encoding (TF-IDF-PE) representation performed better than Term frequency-Positional encoding (TF-PE) representation. The Unicode encoded TF-IDF-PE representation with the SVM classifier yields better average accuracy of 68.62% in K-10 Fold experimentations.

Publisher

Research Square Platform LLC

Reference28 articles.

1. Analytical evaluation of term weighting schemes for text categorization;Altınçay H;Pattern Recognition Letters,2010

2. Caryappa, B. C., Hulipalled, V. R., & Simha, J. B. (2020, October). Kannada Grammar Checker Using LSTM Neural Network. In 2020 International Conference on Smart Technologies in Computing, Electrical and Electronics (ICSTCEE) (pp. 332–337). IEEE.

3. Turning from TF-IDF to TF-IGM for term weighting in text classification;Chen K;Expert Systems with Applications,2016

4. Debole, F., & Sebastiani, F. (2003, March). Supervised term weighting for automated text categorization. In Proceedings of the 2003 ACM symposium on Applied computing (pp. 784–788).

5. Deepamala, N., & Kumar, P. R. (2014). Text classification of Kannada webpages using various pre-processing agents. Recent Advances in Intelligent Informatics (pp. 235–243). Cham: Springer.