Abstract
AbstractAutomatic text classification, in which textual data is categorized into specified categories based on its content, is a classic issue in the science of Natural Language Processing. In recent years, there has been a notable surge in research on medical text classification due to the increasing availability of medical data like patient medical records and medical literature. Machine learning and statistical methods, such as those used in medical text classification, have proven to be highly efficient for these tasks. However, a significant amount of manual labor is still required to categorize the extensive dataset utilized for training. Recent research have demonstrated the effectiveness of pretrained language models, including machine learning models, in reducing the time and effort required for feature engineering by medical experts. However, there is no statistically significant enhancement in performance when directly applying the machine learning model to the classification task. In this paper, we present a hybrid machine learning model that combines individual traditional algorithms augmented by a genetic algorithm. However, the improved model is designed to enhance performance by optimizing the weight parameter. In this context, the best single model demonstrated commendable accuracy. In addition, when applying the hybridization approach and optimizing the weight parameters, the results were substantially enhanced. The results underscore the superiority of our augmented hybrid model over individual traditional algorithms. We conduct experiments using two distinct types of datasets: one comprising medical records, such as the Heart Failure Clinical Record and another consisting of medical literature, such as PubMed 20k RCT. So, the objective is to clearly showcase the effectiveness of our approach by highlighting the significant enhancements in accuracy, precision, F1-score and Recall achieved through our improved model.
Publisher
Springer Science and Business Media LLC
Reference58 articles.
1. Acharya A (2004) GoogleScholar. https://scholar.google.com. Accessed 05 June 2024
2. Ahmad T, Munir A, Bhatti SH, Aftab M, Raza MA (2017) Survival analysis of heart failure patients: a case study. PLoS One 12(7):e0181001
3. Al Amrani Y, Lazaar M, El Kadiri KE (2018) Random forest and support vector machine based hybrid approach to sentiment analysis. Proc Comput Sci 127:511–520
4. Anantharaman A, Jadiya A, Siri CTS, Adikar BN, Mohan B (2019) Performance evaluation of topic modeling algorithms for text classification. In: 2019 3rd international conference on trends in electronics and informatics (ICOEI), pp 704–708. IEEE
5. Asif M, Nishat MM, Faisal F, Dip RR, Udoy MH, Shikder M, Ahsan R et al (2021) Performance evaluation and comparative analysis of different machine learning algorithms in predicting cardiovascular disease. Eng Lett 29(2):731–741