Application of an Improved TF-IDF Method in Literary Text Classification-Reference-Cited by-同舟云学术

Application of an Improved TF-IDF Method in Literary Text Classification

Published:2022-05-09 Issue: Volume:2022 Page:1-10
ISSN:1687-5699
Container-title:Advances in Multimedia
language:en
Short-container-title:Advances in Multimedia

Author:

Xiang Lin¹^ORCID

Affiliation:

1. Public Basic Course Teaching Department, Hubei University of Police, Wuhan 430030, China

Abstract

Literature is extremely important in the advancement of human civilization. Every day, many literary texts of various genres are produced, dating back to ancient times. An urgent concern for managers in the current literary activity is how to classify and save the expanding mass of literary text data for easy access by readers. In the realm of text classification, the TF-IDF algorithm is a widely used classification algorithm. However, there are significant issues with utilizing this approach, including a lack of distribution information inside categories, a lack of distribution information between categories, and an inability to adjust to skewed datasets. It is possible to improve classification accuracy by using the TF-IDF algorithm in this paper’s application situation by exploiting the association between feature words and the quantity of texts in which they appear, while ignoring the variation in feature word distribution across categories. With the purpose of classifying the literary texts in this study, this work proposes an improved IDF method for the problem of feature words appearing several times and having diverse meanings in different fields. The meanings of feature words in distinct domains are separated to increase the trust in the TF-IDF algorithm’s output. Using the improved TF-IDF method suggested in this research with the random forest (RF) classifier, the experimental results show that the classifier has a good classification impact, which can meet the actual work needs, based on comparative experiments on feature dimension selection, feature selection algorithm, feature weight algorithm, and classifier. It has a fair amount of historical significance.

Funder

Hubei University of Police

Publisher

Hindawi Limited

Subject

General Computer Science

Link

http://downloads.hindawi.com/journals/am/2022/9285324.pdf

Reference23 articles.

1. The Automatic Creation of Literature Abstracts

2. A Review on Word Embedding Techniques for Text Classification

3. On Relevance, Probabilistic Indexing and Information Retrieval