Abstract
In many real-world applications, such as those based on electronic health records, prognostic prediction of patient survival is based on heterogeneous sets of clinical laboratory measurements. To address the trade-off between the predictive accuracy of a prognostic model and the costs related to its clinical implementation, we propose an optimized L0-pseudonorm approach to learn sparse solutions in multivariable regression. The model sparsity is maintained by restricting the number of nonzero coefficients in the model with a cardinality constraint, which makes the optimization problem NP-hard. In addition, we generalize the cardinality constraint for grouped feature selection, which makes it possible to identify key sets of predictors that may be measured together in a kit in clinical practice. We demonstrate the operation of our cardinality constraint-based feature subset selection method, named OSCAR, in the context of prognostic prediction of prostate cancer patients, where it enables one to determine the key explanatory predictors at different levels of model sparsity. We further explore how the model sparsity affects the model accuracy and implementation cost. Lastly, we demonstrate generalization of the presented methodology to high-dimensional transcriptomics data.
Funder
University of Turku Graduate School
Academy of Finland
Cancer Society of Finland
Sigrid Jusélius Foundation
University of Turku
Cancer Foundation Finland
Hospital District of Helsinki and Uusimaa
Helse Sør-Øst
Radium Hospital Foundation
European Union’s Horizon 2020 Research and Innovation Programme
Finnish Cancer Institute (FICAN Cancer Researcher) and Finnish Cultural Foundation
Publisher
Public Library of Science (PLoS)
Subject
Computational Theory and Mathematics,Cellular and Molecular Neuroscience,Genetics,Molecular Biology,Ecology,Modeling and Simulation,Ecology, Evolution, Behavior and Systematics
Reference57 articles.
1. Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries;H Sung;CA: A Cancer Journal for Clinicians,2021
2. Prostate cancer;RJ Rebello;Nature Reviews Disease Primers,2021
3. Predicting prostate cancer death with different pretreatment risk stratification tools: A head-to-head comparison in a nationwide cohort study;R Zelic;European Urology,2020
4. The lasso method for variable selection in the Cox model;R Tibshirani;Statistics in Medicine,1997
5. Regularization paths for Cox’s proportional hazards model via coordinate descent;N Simon;Journal of Statistical Software,2011
Cited by
3 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献