Author:
Zhou Xueping,Cai Manqi,Yue Molin,Celedón Juan,Ding Ying,Chen Wei,Li Yanming
Abstract
AbstractWe propose a supervised learning algorithm to perform feature selection and outcome prediction for genomic data with multi-phenotypic responses. Our algorithm particularly incorporates the genome and/or phenotype grouping structures and phenotype correlation structures in feature selection, effect estimation, and outcome prediction under a penalized multi-response linear regression model. Extensive simulations demonstrate its superior performance over its competing methods. We apply the proposed algorithm to two omics studies. In the first study, we identified novel association signals between multivariate gene expressions and high-dimensional DNA methylation profiles, providing biological insights into how CpG sites regulate gene expressions. The second study is for cell type deconvolution. Using the proposed algorithm, we were able to achieve better cell type fraction predictions using high-dimensional gene expression data.
Publisher
Cold Spring Harbor Laboratory
Reference37 articles.
1. Histone modifications and their role in epigenetics of atopy and allergic diseases;Allergy, Asthma & Clinical Immunology,2018
2. Digital cell quantification identifies global immune cell dynamics during influenza infection
3. Gene Ontology: tool for the unification of biology
4. Benchmarking of cell type deconvolution pipelines for transcriptomics data;Nature communications,2020
5. Decompress: tissue compartment deconvolution of targeted mrna expression panels using compressed sensing;Nucleic acids research,2021