Affiliation:
1. Department of Mathematical Sciences University of Cincinnati Cincinnati Ohio USA
2. Department of Population and Quantitative Health Sciences Case Western Reserve University Cleveland Ohio USA
3. Department of Statistics Sungkyunkwan University Seoul South Korea
Abstract
SummaryOmics data, routinely collected in various clinical settings, are of a complex and network‐structured nature. Recent progress in RNA sequencing (RNA‐seq) allows us to explore whole‐genome gene expression profiles and to develop predictive model for disease risk. In this study, we propose a novel Bayesian approach to construct RNA‐seq‐based risk score leveraging gene expression network for disease risk prediction. Specifically, we consider a hierarchical model with spike and slab priors over regression coefficients as well as entries in the inverse covariance matrix for covariates to simultaneously perform variable selection and network estimation in high‐dimensional logistic regression. Through theoretical investigation and simulation studies, our method is shown to both enjoy desirable consistency properties and achieve superior empirical performance compared with other state‐of‐the‐art methods. We analyse RNA‐seq gene expression data from 441 asthmatic and 254 non‐asthmatic samples to form a weighted network‐guided risk score and benchmark the proposed method against existing approaches for asthma risk stratification.