Comparison of Supervised Classification Models on Textual Data-Reference-Cited by-同舟云学术

Comparison of Supervised Classification Models on Textual Data

Published:2020-05-24 Issue:5 Volume:8 Page:851
ISSN:2227-7390
Container-title:Mathematics
language:en
Short-container-title:Mathematics

Author:

Hsu Bi-Min^ORCID

Abstract

Text classification is an essential aspect in many applications, such as spam detection and sentiment analysis. With the growing number of textual documents and datasets generated through social media and news articles, an increasing number of machine learning methods are required for accurate textual classification. For this paper, a comprehensive evaluation of the performance of multiple supervised learning models, such as logistic regression (LR), decision trees (DT), support vector machine (SVM), AdaBoost (AB), random forest (RF), multinomial naive Bayes (NB), multilayer perceptrons (MLP), and gradient boosting (GB), was conducted to assess the efficiency and robustness, as well as limitations, of these models on the classification of textual data. SVM, LR, and MLP had better performance in general, with SVM being the best, while DT and AB had much lower accuracies amongst all the tested models. Further exploration on the use of different SVM kernels was performed, demonstrating the advantage of using linear kernels over polynomial, sigmoid, and radial basis function kernels for text classification. The effects of removing stop words on model performance was also investigated; DT performed better with stop words removed, while all other models were relatively unaffected by the presence or absence of stop words.

Funder

Ministry of Science and Technology, Taiwan

Publisher

MDPI AG

Subject

General Mathematics,Engineering (miscellaneous),Computer Science (miscellaneous)

Link

https://www.mdpi.com/2227-7390/8/5/851/pdf

Reference44 articles.

1. Big Data in Complex and Social Networks;Thai,2016

2. Text classification using ESC-based stochastic decision lists

3. Survey on supervised machine learning techniques for automatic text classification

Cited by 31 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. A systematic review and meta-analysis of machine learning, deep learning, and ensemble learning approaches in predicting EV charging behavior;Engineering Applications of Artificial Intelligence;2024-09

2. A new probability adjustment method for combining conflicting evidences: application in classifier combination;Soft Computing;2024-07-24

3. A systematic review and meta-analysis of artificial neural network, machine learning, deep learning, and ensemble learning approaches in field of geotechnical engineering;Neural Computing and Applications;2024-05-13

4. Multi-Class Multi-Level Classification of Mental Health Disorders Based on Textual Data from Social Media;Journal of Information and Communication Technology;2024-01-30

5. Evaluating the Impact of Assignment Group and Category Classification Prediction of Incoming Service Requests on the Perceived Service Quality: A Quasiexperimental Study in the Enterprise Software Industry;IEEE Transactions on Engineering Management;2024