Twenty Years of Machine-Learning-Based Text Classification: A Systematic Review-Reference-Cited by-同舟云学术

Twenty Years of Machine-Learning-Based Text Classification: A Systematic Review

Published:2023-04-29 Issue:5 Volume:16 Page:236
ISSN:1999-4893
Container-title:Algorithms
language:en
Short-container-title:Algorithms

Author:

Palanivinayagam Ashokkumar¹^ORCID,El-Bayeh Claude Ziad²^ORCID,Damaševičius Robertas³^ORCID

Affiliation:

1. Sri Ramachandra Faculty of Engineering and Technology, Sri Ramachandra Institute of Higher Education and Research, Chennai 600116, India

2. Department of Electrical Engineering, Bayeh Institute, Amchit 4307, Lebanon

3. Department of Software Engineering, Kaunas University of Technology, 44249 Kaunas, Lithuania

Abstract

Machine-learning-based text classification is one of the leading research areas and has a wide range of applications, which include spam detection, hate speech identification, reviews, rating summarization, sentiment analysis, and topic modelling. Widely used machine-learning-based research differs in terms of the datasets, training methods, performance evaluation, and comparison methods used. In this paper, we surveyed 224 papers published between 2003 and 2022 that employed machine learning for text classification. The Preferred Reporting Items for Systematic Reviews (PRISMA) statement is used as the guidelines for the systematic review process. The comprehensive differences in the literature are analyzed in terms of six aspects: datasets, machine learning models, best accuracy, performance evaluation metrics, training and testing splitting methods, and comparisons among machine learning models. Furthermore, we highlight the limitations and research gaps in the literature. Although the research works included in the survey perform well in terms of text classification, improvement is required in many areas. We believe that this survey paper will be useful for researchers in the field of text classification.

Publisher

MDPI AG

Subject

Computational Mathematics,Computational Theory and Mathematics,Numerical Analysis,Theoretical Computer Science

Link

https://www.mdpi.com/1999-4893/16/5/236/pdf

Reference137 articles.

1. Machine Learning in Automated Text Categorization;Sebastiani;ACM Comput. Surv.,2002

2. Kowsari, K., Jafari Meimandi, K., Heidarysafa, M., Mendu, S., Barnes, L., and Brown, D. (2019). Text Classification Algorithms: A Survey. Information, 10.

3. Kapočiute-Dzikiene, J. (2020). A domain-specific generative chatbot trained from little data. Appl. Sci., 10.

4. Real-Time Text Classification of User-Generated Content on Social Media: Systematic Review;Rogers;IEEE Trans. Comput. Soc. Syst.,2022

5. BERT-based Transfer Learning Model for COVID-19 Sentiment Analysis on Turkish Instagram Comments;Karayigit;Inf. Technol. Control,2022

Cited by 17 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. GBERT: A hybrid deep learning model based on GPT-BERT for fake news detection;Heliyon;2024-08

2. Text classification based on optimization feature selection methods: a review and future directions;Multimedia Tools and Applications;2024-07-06

3. EDUCATIONAL DATA MINING AND LEARNING ANALYTICS: TEXT GENERATORS USAGE EFFECT ON STUDENTS’ GRADES;New Trends in Computer Sciences;2024-06-04

4. Analysis and Research on Automatic Text Classification Algorithm Based on Machine Learning;2024 6th International Conference on Communications, Information System and Computer Engineering (CISCE);2024-05-10

5. Methods and applications of machine learning in computational design of optoelectronic semiconductors;Science China Materials;2024-03-19