Identification of influential observations in high-dimensional survival data through robust penalized Cox regression based on trimming
-
Published:2023
Issue:3
Volume:20
Page:5352-5378
-
ISSN:1551-0018
-
Container-title:Mathematical Biosciences and Engineering
-
language:
-
Short-container-title:MBE
Author:
Sun Hongwei12, Gao Qian2, Zhu Guiming1, Han Chunlei1, Yan Haosen1, Wang Tong2
Affiliation:
1. Department of Health Statistics, School of Public Health and Management, Binzhou Medical University, Yantai City, Shandong 264003, China 2. Department of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan City, Shanxi 030001, China
Abstract
<abstract>
<p>Penalized Cox regression can efficiently be used for the determination of biomarkers in high-dimensional genomic data related to disease prognosis. However, results of Penalized Cox regression is influenced by the heterogeneity of the samples who have different dependent structure between survival time and covariates from most individuals. These observations are called influential observations or outliers. A robust penalized Cox model (Reweighted Elastic Net-type maximum trimmed partial likelihood estimator, Rwt MTPL-EN) is proposed to improve the prediction accuracy and identify influential observations. A new algorithm AR-Cstep to solve Rwt MTPL-EN model is also proposed. This method has been validated by simulation study and application to glioma microarray expression data. When there were no outliers, the results of Rwt MTPL-EN were close to the Elastic Net (EN). When outliers existed, the results of EN were impacted by outliers. And whenever the censored rate was large or low, the robust Rwt MTPL-EN performed better than EN. and could resist the outliers in both predictors and response. In terms of outliers detection accuracy, Rwt MTPL-EN was much higher than EN. The outliers who "lived too long" made EN perform worse, but were accurately detected by Rwt MTPL-EN. Through the analysis of glioma gene expression data, most of the outliers identified by EN were those "failed too early", but most of them were not obvious outliers according to risk estimated from omics data or clinical variables. Most of the outliers identified by Rwt MTPL-EN were those who "lived too long", and most of them were obvious outliers according to risk estimated from omics data or clinical variables. Rwt MTPL-EN can be adopted to detect influential observations in high-dimensional survival data.</p>
</abstract>
Publisher
American Institute of Mathematical Sciences (AIMS)
Subject
Applied Mathematics,Computational Mathematics,General Agricultural and Biological Sciences,Modeling and Simulation,General Medicine
Reference45 articles.
1. Z. Liu, M. Li, Q. Hua, Y. Li, G. Wang, Identification of an eight-lncrna prognostic model for breast cancer using wgcna network analysis and a cox‑proportional hazards model based on l1-penalized estimation, Int. J. Mol. Med., 44 (2019), 1333–1343. https://doi.org/10.3892/ijmm.2019.4303 2. X. Y. Shen, X. P. Liu, C. K. Song, Y. J. Wang, S. Li, W. D. Hu, Genome‐wide analysis reveals alcohol dehydrogenase 1c and secreted phosphoprotein 1 for prognostic biomarkers in lung adenocarcinoma, J. Cellular Physiol., 234 (2019), 22311–22320. https://doi.org/10.1002/jcp.28797 3. L. Wang, J. Shi, Y. Huang, S. Liu, J. Zhang, H. Ding, et al., A six-gene prognostic model predicts overall survival in bladder cancer patients, Cancer Cell Int., 19 (2019), 229. https://doi.org/10.1186/s12935-019-0950-7 4. J. Choi, S. Park, Y. Yoon, J. Ahn, Improved prediction of breast cancer outcome by identifying heterogeneous biomarkers, Bioinformatics, 33 (2017), 3619–3626. https://doi.org/10.1093/bioinformatics/btx487 5. K. Polyak, Heterogeneity in breast cancer, J. Clin. Invest., 121 (2011), 3786–3788. https://doi.org/10.1172/JCI60534
|
|