A Highly Discriminative Hybrid Feature Selection Algorithm for Cancer Diagnosis-Reference-Cited by-同舟云学术

A Highly Discriminative Hybrid Feature Selection Algorithm for Cancer Diagnosis

Published:2022-08-09 Issue: Volume:2022 Page:1-15
ISSN:1537-744X
Container-title:The Scientific World Journal
language:en
Short-container-title:The Scientific World Journal

Author:

Elemam Tarneem¹^ORCID,Elshrkawey Mohamed¹^ORCID

Affiliation:

1. Information Systems Department, Suez Canal University, Ismailia 41522, Egypt

Abstract

Cancer is a deadly disease that occurs due to rapid and uncontrolled cell growth. In this article, a machine learning (ML) algorithm is proposed to diagnose different cancer diseases from big data. The algorithm comprises a two-stage hybrid feature selection. In the first stage, an overall ranker is initiated to combine the results of three filter-based feature evaluation methods, namely, chi-squared, F-statistic, and mutual information (MI). The features are then ordered according to this combination. In the second stage, the modified wrapper-based sequential forward selection is utilized to discover the optimal feature subset, using ML models such as support vector machine (SVM), decision tree (DT), random forest (RF), and K-nearest neighbor (KNN) classifiers. To examine the proposed algorithm, many tests have been carried out on four cancerous microarray datasets, employing in the process 10-fold cross-validation and hyperparameter tuning. The performance of the algorithm is evaluated by calculating the diagnostic accuracy. The results indicate that for the leukemia dataset, both SVM and KNN models register the highest accuracy at 100% using only 5 features. For the ovarian cancer dataset, the SVM model achieves the highest accuracy at 100% using only 6 features. For the small round blue cell tumor (SRBCT) dataset, the SVM model also achieves the highest accuracy at 100% using only 8 features. For the lung cancer dataset, the SVM model also achieves the highest accuracy at 99.57% using 19 features. By comparing with other algorithms, the results obtained from the proposed algorithm are superior in terms of the number of selected features and diagnostic accuracy.

Publisher

Hindawi Limited

Subject

General Environmental Science,General Biochemistry, Genetics and Molecular Biology,General Medicine

Link

http://downloads.hindawi.com/journals/tswj/2022/1056490.pdf

Reference57 articles.

1. Gene selection and classification of microarray data method based on mutual information and moth flame algorithm;A. Dabba;Expert Systems with Applications,2021

2. Optimizing ANFIS using simulated annealing algorithm for classification of microarray gene expression cancer data;B. Haznedar;Medical, and Biological Engineering and Computing,2021

3. Gene selection for microarray data classification via dual latent representation learning

4. A novel ECOC algorithm for multiclass microarray data classification based on data complexity analysis

5. Bacterial foraging optimization algorithm based feature selection for microarray data classification;M. J. Rani;Materials Today Proceedings,2021

Cited by 7 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Deciphering Key Genes in Colon Cancer Through Deep Learning Techniques;2024 Third International Conference on Smart Technologies and Systems for Next Generation Computing (ICSTSN);2024-07-18

2. Hybrid wrapper feature selection method based on genetic algorithm and extreme learning machine for intrusion detection;Journal of Big Data;2024-02-01

3. Ensemble of Deep Features for Breast Cancer Histopathological Image Classification;The Computer Journal;2024-01-14

4. Hybrid Gene Selection Methods for High-Dimensional Lung Cancer Data Using Improved Arithmetic Optimization Algorithm;Computers, Materials & Continua;2024

5. Feature Engineering with Microarray Gene Expression Techniques for Asymptomatic Disease Classification;2023 1st DMIHER International Conference on Artificial Intelligence in Education and Industry 4.0 (IDICAIEI);2023-11-27