Unified methods for feature selection in large-scale genomic studies with censored survival outcomes-Reference-Cited by-同舟云学术

Unified methods for feature selection in large-scale genomic studies with censored survival outcomes

Published:2020-03-10 Issue:11 Volume:36 Page:3409-3417
ISSN:1367-4803
Container-title:Bioinformatics
language:en
Short-container-title:

Author:

Spirko-Burns Lauren¹,Devarajan Karthik²^ORCID

Affiliation:

1. Department of Statistical Science, Temple University

2. Department of Biostatistics & Bioinformatics, Fox Chase Cancer Center, Temple University Health System, Philadelphia, PA, USA

Abstract

Abstract Motivation One of the major goals in large-scale genomic studies is to identify genes with a prognostic impact on time-to-event outcomes which provide insight into the disease process. With rapid developments in high-throughput genomic technologies in the past two decades, the scientific community is able to monitor the expression levels of tens of thousands of genes and proteins resulting in enormous datasets where the number of genomic features is far greater than the number of subjects. Methods based on univariate Cox regression are often used to select genomic features related to survival outcome; however, the Cox model assumes proportional hazards (PH), which is unlikely to hold for each feature. When applied to genomic features exhibiting some form of non-proportional hazards (NPH), these methods could lead to an under- or over-estimation of the effects. We propose a broad array of marginal screening techniques that aid in feature ranking and selection by accommodating various forms of NPH. First, we develop an approach based on Kullback–Leibler information divergence and the Yang–Prentice model that includes methods for the PH and proportional odds (PO) models as special cases. Next, we propose R2 measures for the PH and PO models that can be interpreted in terms of explained randomness. Lastly, we propose a generalized pseudo-R2 index that includes PH, PO, crossing hazards and crossing odds models as special cases and can be interpreted as the percentage of separability between subjects experiencing the event and not experiencing the event according to feature measurements. Results We evaluate the performance of our measures using extensive simulation studies and publicly available datasets in cancer genomics. We demonstrate that the proposed methods successfully address the issue of NPH in genomic feature selection and outperform existing methods. Availability and implementation R code for the proposed methods is available at github.com/lburns27/Feature-Selection. Contact karthik.devarajan@fccc.edu Supplementary information Supplementary data are available at Bioinformatics online.

Funder

National Institutes of Health

National Cancer Institute

National Science Foundation

US Army

Publisher

Oxford University Press (OUP)

Subject

Computational Mathematics,Computational Theory and Mathematics,Computer Science Applications,Molecular Biology,Biochemistry,Statistics and Probability

Link

http://academic.oup.com/bioinformatics/advance-article-pdf/doi/10.1093/bioinformatics/btaa161/33125534/btaa161.pdf

Reference43 articles.

1. Review of survival analysis published in cancer journals;Altman;Br. J. Cancer,1995

2. Model misspecification in proportional hazards regression;Anderson;Biometrika,1995

3. Controlling the false discovery rate: a practical and powerful approach to multiple testing;Benjamini;J. R. Stat. Soc,1995

4. Analysis of survival data by the proportional odds model;Bennett;Stat. Med,1983

Cited by 4 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Identification of a gene expression signature associated with breast cancer survival and risk that improves clinical genomic platforms;Bioinformatics Advances;2023-01-01

2. Inferring the heritability of bacterial traits in the era of machine learning;Bioinformatics Advances;2023-01-01

3. Elaboration Models with Symmetric Information Divergence;International Statistical Review;2022-04-20

4. A statistical framework for non-negative matrix factorization based on generalized dual divergence;Neural Networks;2021-08