Affiliation:
1. Sri Ramachandra Faculty of Engineering and Technology, Sri Ramachandra Institute of Higher Education and Research, Chennai 600116, India
2. Department of Electrical Engineering, Bayeh Institute, Amchit 4307, Lebanon
3. Department of Software Engineering, Kaunas University of Technology, 44249 Kaunas, Lithuania
Abstract
Machine-learning-based text classification is one of the leading research areas and has a wide range of applications, which include spam detection, hate speech identification, reviews, rating summarization, sentiment analysis, and topic modelling. Widely used machine-learning-based research differs in terms of the datasets, training methods, performance evaluation, and comparison methods used. In this paper, we surveyed 224 papers published between 2003 and 2022 that employed machine learning for text classification. The Preferred Reporting Items for Systematic Reviews (PRISMA) statement is used as the guidelines for the systematic review process. The comprehensive differences in the literature are analyzed in terms of six aspects: datasets, machine learning models, best accuracy, performance evaluation metrics, training and testing splitting methods, and comparisons among machine learning models. Furthermore, we highlight the limitations and research gaps in the literature. Although the research works included in the survey perform well in terms of text classification, improvement is required in many areas. We believe that this survey paper will be useful for researchers in the field of text classification.
Subject
Computational Mathematics,Computational Theory and Mathematics,Numerical Analysis,Theoretical Computer Science
Reference137 articles.
1. Machine Learning in Automated Text Categorization;Sebastiani;ACM Comput. Surv.,2002
2. Kowsari, K., Jafari Meimandi, K., Heidarysafa, M., Mendu, S., Barnes, L., and Brown, D. (2019). Text Classification Algorithms: A Survey. Information, 10.
3. Kapočiute-Dzikiene, J. (2020). A domain-specific generative chatbot trained from little data. Appl. Sci., 10.
4. Real-Time Text Classification of User-Generated Content on Social Media: Systematic Review;Rogers;IEEE Trans. Comput. Soc. Syst.,2022
5. BERT-based Transfer Learning Model for COVID-19 Sentiment Analysis on Turkish Instagram Comments;Karayigit;Inf. Technol. Control,2022
Cited by
17 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献