Research on performance variations of classifiers with the influence of pre-processing methods for Chinese short text classification-Reference-Cited by-同舟云学术

Research on performance variations of classifiers with the influence of pre-processing methods for Chinese short text classification

Published:2023-10-12 Issue:10 Volume:18 Page:e0292582
ISSN:1932-6203
Container-title:PLOS ONE
language:en
Short-container-title:PLoS ONE

Author:

Zhang Dezheng,Li Jing,Xie Yonghong^ORCID,Wulamu Aziguli

Abstract

Text pre-processing is an important component of a Chinese text classification. At present, however, most of the studies on this topic focus on exploring the influence of preprocessing methods on a few text classification algorithms using English text. In this paper we experimentally compared fifteen commonly used classifiers on two Chinese datasets using three widely used Chinese preprocessing methods that include word segmentation, Chinese specific stop word removal, and Chinese specific symbol removal. We then explored the influence of the preprocessing methods on the final classifications according to various conditions such as classification evaluation, combination style, and classifier selection. Finally, we conducted a battery of various additional experiments, and found that most of the classifiers improved in performance after proper preprocessing was applied. Our general conclusion is that the systematic use of preprocessing methods can have a positive impact on the classification of Chinese short text, using classification evaluation such as macro-F1, combination of preprocessing methods such as word segmentation, Chinese specific stop word and symbol removal, and classifier selection such as machine and deep learning models. We find that the best macro-f1s for categorizing text for the two datasets are 92.13% and 91.99%, which represent improvements of 0.3% and 2%, respectively over the compared baselines.

Publisher

Public Library of Science (PLoS)

Subject

Multidisciplinary

Reference54 articles.

1. Dynamic embedding projection-gated convolutional neural networks for text classification;Z Tan;IEEE T Neur Net Learn,2021

2. A survey on accuracy-oriented neural recommendation: From collaborative filtering to information-rich recommendation;L Wu;IEEE T Knowl Data En,2022

3. A benchmark dataset and case study for Chinese medical question intent classification;N Chen;BMC Med Inform Decis Mak,2020

4. Patient Diet Recommendation System Using K Clique and Deep learning Classifiers;S Manoharan;J of Artif Intell,2020

5. State of the art: a review of sentiment analysis based on sequential transfer learning;JYL Chan;Artif Intell Rev,2023

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Integrating deep learning and multi-attention for joint extraction of entities and relationships in engineering consulting texts;Automation in Construction;2024-12

2. A Comparative Study of Hybrid Models in Health Misinformation Text Classification;4th International Workshop on OPEN CHALLENGES IN ONLINE SOCIAL NETWORKS;2024-09-10