Affiliation:
1. College of Computer Science and Technology, Taiyuan University of Technology, Taiyuan 030024, China
2. College of Data Science, Taiyuan University of Technology, Taiyuan 030024, China
Abstract
Biomarker selection for predictive analytics encounters the problem of identifying a minimal-size subset of genes that is maximally predictive of an outcome of interest. For lung cancer gene expression datasets, it is a great challenge to handle the characteristics of small sample size, high dimensionality, high noise as well as the low reproducibility of important biomarkers in different studies. In this paper, our proposed meta-analysis-based regularized orthogonal matching pursuit (MA-ROMP) algorithm not only gains strength by using multiple datasets to identify important genomic biomarkers efficiently, but also keeps the selection flexible among datasets to take into account data heterogeneity through a hierarchical decomposition on regression coefficients. For a case study of lung cancer, we downloaded GSE10072, GSE19188 and GSE19804 from the GEO database with inconsistent experimental conditions, sample preparation methods, different study groups, etc. Compared with state-of-the-art methods, our method shows the highest accuracy, of up to 95.63%, with the best discriminative ability (AUC 0.9756) as well as a more than 15-fold decrease in its training time. The experimental results on both simulated data and several lung cancer gene expression datasets demonstrate that MA-ROMP is a more effective tool for biomarker selection and learning cancer prediction.
Funder
Natural Science Foundation of China
Central Government’s Guide to Local Science and Technology Development Fund
Natural Science Foundation of Shanxi Province
Foundation of Taiyuan University of Technology
Subject
General Mathematics,Engineering (miscellaneous),Computer Science (miscellaneous)
Reference36 articles.
1. Feature selection for high-dimensional data;Prog. Artif. Intell.,2016
2. Cancer Statistics, 2023;Siegel;CA Cancer J. Clin.,2023
3. A comprehensive survey on recent metaheuristics for feature selection;Dokeroglu;Neurocomputing,2022
4. FCAN-MOPSO: An Improved Fuzzy-based Graph Clustering Algorithm for Complex Networks with Multi-objective Particle Swarm Optimization;Hu;IEEE Trans. Fuzzy Syst.,2023
5. Regression shrinkage and selection via the lasso;Tibshirani;J. R. Stat. Soc. Ser. B Methodol.,1996