Stable bagging feature selection on medical data-Reference-Cited by-同舟云学术

Stable bagging feature selection on medical data

Published:2021-01-07 Issue:1 Volume:8 Page:
ISSN:2196-1115
Container-title:Journal of Big Data
language:en
Short-container-title:J Big Data

Author:

Alelyani Salem^ORCID

Abstract

AbstractIn the medical field, distinguishing genes that are relevant to a specific disease, let’s say colon cancer, is crucial to finding a cure and understanding its causes and subsequent complications. Usually, medical datasets are comprised of immensely complex dimensions with considerably small sample size. Thus, for domain experts, such as biologists, the task of identifying these genes have become a very challenging one, to say the least. Feature selection is a technique that aims to select these genes, or features in machine learning field with respect to the disease. However, learning from a medical dataset to identify relevant features suffers from the curse-of-dimensionality. Due to a large number of features with a small sample size, the selection usually returns a different subset each time a new sample is introduced into the dataset. This selection instability is intrinsically related to data variance. We assume that reducing data variance improves selection stability. In this paper, we propose an ensemble approach based on the bagging technique to improve feature selection stability in medical datasets via data variance reduction. We conducted an experiment using four microarray datasets each of which suffers from high dimensionality and relatively small sample size. On each dataset, we applied five well-known feature selection algorithms to select varying number of features. The proposed technique shows a significant improvement in selection stability while at least maintaining the classification accuracy. The stability improvement ranges from 20 to 50 percent in all cases. This implies that the likelihood of selecting the same features increased 20 to 50 percent more. This is accompanied with the increase of classification accuracy in most cases, which signifies the stated results of stability.

Funder

King Khalid University

Publisher

Springer Science and Business Media LLC

Subject

Information Systems and Management,Computer Networks and Communications,Hardware and Architecture,Information Systems

Link

http://link.springer.com/content/pdf/10.1186/s40537-020-00385-8.pdf

Reference65 articles.

1. Dy JG, Brodley CE. Feature selection for unsupervised learning. J Mach Learn Res. 2004;5:845–89.

2. Tang J, Alelyani S, Liu H. Feature selection for classification: a review. Data Classification: Algorithms and Applications. 2014;37.

3. Alelyani S, Tang J, Liu H. Feature selection for clustering: a review. Data Clust. 2013;29:110–21.

4. Yu L, Liu H. Feature selection for high-dimensional data: a fast correlation-based filter solution. 2003;856–863.

5. Leung YY, Chang CQ, Hung YS, Fung PCW. Gene selection for brain cancer classification. Conf Proc IEEE Eng Med Biol Soc. 2006;1:5846–9.

Cited by 37 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Residual stress prediction in laser shock peening induced LD-TC4 alloy by data-driven ensemble learning methods;Optics & Laser Technology;2024-09

2. AI-Driven Prediction and Mapping of Soil Liquefaction Risks for Enhancing Earthquake Resilience in Smart Cities;Smart Cities;2024-07-17

3. A predictive modeling approach for Taiwanese diagnosis-related groups medical costs: A focus on laparoscopic appendectomy;Tungs' Medical Journal;2024-07-01

4. Stable multivariate lesion symptom mapping;Aperture Neuro;2024-06-07

5. Advanced Ensemble Classifier Techniques for Predicting Tumor Viability in Osteosarcoma Histological Slide Images;Applied Data Science and Analysis;2024-05-29