Effective hybrid feature selection using different bootstrap enhances cancers classification performance-Reference-Cited by-同舟云学术

Effective hybrid feature selection using different bootstrap enhances cancers classification performance

Published:2022-09-30 Issue:1 Volume:15 Page:
ISSN:1756-0381
Container-title:BioData Mining
language:en
Short-container-title:BioData Mining

Author:

Abdelwahed Noura Mohammed^ORCID,El-Tawel Gh. S.,Makhlouf M. A.

Abstract

Abstract Background Machine learning can be used to predict the different onset of human cancers. Highly dimensional data have enormous, complicated problems. One of these is an excessive number of genes plus over-fitting, fitting time, and classification accuracy. Recursive Feature Elimination (RFE) is a wrapper method for selecting the best subset of features that cause the best accuracy. Despite the high performance of RFE, time computation and over-fitting are two disadvantages of this algorithm. Random forest for selection (RFS) proves its effectiveness in selecting the effective features and improving the over-fitting problem. Method This paper proposed a method, namely, positions first bootstrap step (PFBS) random forest selection recursive feature elimination (RFS-RFE) and its abbreviation is PFBS- RFS-RFE to enhance cancer classification performance. It used a bootstrap with many positions included in the outer first bootstrap step (OFBS), inner first bootstrap step (IFBS), and outer/ inner first bootstrap step (O/IFBS). In the first position, OFBS is applied as a resampling method (bootstrap) with replacement before selection step. The RFS is applied with bootstrap = false i.e., the whole datasets are used to build each tree. The importance features are hybrid with RFE to select the most relevant subset of features. In the second position, IFBS is applied as a resampling method (bootstrap) with replacement during applied RFS. The importance features are hybrid with RFE. In the third position, O/IFBS is applied as a hybrid of first and second positions. RFE used logistic regression (LR) as an estimator. The proposed methods are incorporated with four classifiers to solve the feature selection problems and modify the performance of RFE, in which five datasets with different size are used to assess the performance of the PFBS-RFS-RFE. Results The results showed that the O/IFBS-RFS-RFE achieved the best performance compared with previous work and enhanced the accuracy, variance and ROC area for RNA gene and dermatology erythemato-squamous diseases datasets to become 99.994%, 0.0000004, 1.000 and 100.000%, 0.0 and 1.000, respectively. Conclusion High dimensional datasets and RFE algorithm face many troubles in cancers classification performance. PFBS-RFS-RFE is proposed to fix these troubles with different positions. The importance features which extracted from RFS are used with RFE to obtain the effective features.

Funder

Suez Canal University

Publisher

Springer Science and Business Media LLC

Subject

Computational Mathematics,Computational Theory and Mathematics,Computer Science Applications,Genetics,Molecular Biology,Biochemistry

Link

https://link.springer.com/content/pdf/10.1186/s13040-022-00304-y.pdf

Reference63 articles.

1. Tran KA, Kondrashova O, Bradley A, Williams ED, Pearson JV, Waddell N. Deep learning in cancer diagnosis, prognosis and treatment selection. Genome Med. 2021;13(1):152. https://doi.org/10.1186/s13073-021-00968-x.

2. Bi WL, Hosny A, Schabath MB, Giger ML, Birkbak NJ, Mehrtash A, et al. Artificial intelligence in cancer imaging: clinical challenges and applications. CA Cancer J Clin. 2019;69(2):127–57. https://doi.org/10.3322/caac.21552.

3. Fang H, Shi K, Wang X, Zuo C, Lan X. Artificial intelligence in positron emission tomography. Front Med (Lausanne). 2022;9:848336. https://doi.org/10.3389/fmed.2022.848336 PMID: 35174194; PMCID: PMC8841845.

4. Alfayez AA, Kunz H, Lai AG. Predicting the risk of cancer in adults using supervised machine learning: a scoping review. BMJ Open. 2021;11(9). https://doi.org/10.1136/bmjopen-2020-047755 .

5. Liew XY, Hameed N, Clos J. A review of computer-aided expert systems for breast cancer diagnosis. Cancers (Basel). 2021;13(11):2764. https://doi.org/10.3390/cancers13112764 PMID: 34199444; PMCID: PMC8199592.

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Applying a nomogram based on preoperative CT to predict early recurrence of laryngeal squamous cell carcinoma after surgery;Journal of X-Ray Science and Technology;2023-05-11