Affiliation:
1. Advanced Manufacturing Institute, King Saud University, Riyadh, Saudi Arabia
2. School of Computer Science Engineering and Information Systems, Vellore Institute of Technology, Vellore, Tamil Nadu, India
3. Department of Computer Science and Engineering, Faculty of Engineering and Technology, SRM Institute of Science and Technology, Ramapuram, Chennai, India
Abstract
The emergence of the novel COVID-19 virus has had a profound impact on global healthcare systems and economies, underscoring the imperative need for the development of precise and expeditious diagnostic tools. Machine learning techniques have emerged as a promising avenue for augmenting the capabilities of medical professionals in disease diagnosis and classification. In this research, the EFS-XGBoost classifier model, a robust approach for the classification of patients afflicted with COVID-19 is proposed. The key innovation in the proposed model lies in the Ensemble-based Feature Selection (EFS) strategy, which enables the judicious selection of relevant features from the expansive COVID-19 dataset. Subsequently, the power of the eXtreme Gradient Boosting (XGBoost) classifier to make precise distinctions among COVID-19-infected patients is harnessed.The EFS methodology amalgamates five distinctive feature selection techniques, encompassing correlation-based, chi-squared, information gain, symmetric uncertainty-based, and gain ratio approaches. To evaluate the effectiveness of the model, comprehensive experiments were conducted using a COVID-19 dataset procured from Kaggle, and the implementation was executed using Python programming. The performance of the proposed EFS-XGBoost model was gauged by employing well-established metrics that measure classification accuracy, including accuracy, precision, recall, and the F1-Score. Furthermore, an in-depth comparative analysis was conducted by considering the performance of the XGBoost classifier under various scenarios: employing all features within the dataset without any feature selection technique, and utilizing each feature selection technique in isolation. The meticulous evaluation reveals that the proposed EFS-XGBoost model excels in performance, achieving an astounding accuracy rate of 99.8%, surpassing the efficacy of other prevailing feature selection techniques. This research not only advances the field of COVID-19 patient classification but also underscores the potency of ensemble-based feature selection in conjunction with the XGBoost classifier as a formidable tool in the realm of medical diagnosis and classification.