Robust and accurate estimation of cellular fraction from tissue omics data via ensemble deconvolution

Author:

Cai Manqi1,Yue Molin1,Chen Tianmeng23,Liu Jinling45,Forno Erick6,Lu Xinghua7ORCID,Billiar Timothy2,Celedón Juan6ORCID,McKennan Chris8,Chen Wei6ORCID,Wang Jiebiao1ORCID

Affiliation:

1. Department of Biostatistics, University of Pittsburgh , Pittsburgh, PA 15261, USA

2. Department of Surgery, University of Pittsburgh , Pittsburgh, PA 15213, USA

3. Department of Pathology, University of Pittsburgh School of Medicine , Pittsburgh, PA 15213, USA

4. Department of Engineering Management and Systems Engineering, Missouri University of Science and Technology , Rolla, MO 65409, USA

5. Department of Biological Sciences, Missouri University of Science and Technology , Rolla, MO 65409, USA

6. Department of Pediatrics, University of Pittsburgh Medical Center Children’s Hospital of Pittsburgh , Pittsburgh, PA 15224, USA

7. Department of Biomedical Informatics, University of Pittsburgh , Pittsburgh, PA 15206, USA

8. Department of Statistics, University of Pittsburgh , Pittsburgh, PA 15213, USA

Abstract

Abstract Motivation Tissue-level omics data such as transcriptomics and epigenomics are an average across diverse cell types. To extract cell-type-specific (CTS) signals, dozens of cellular deconvolution methods have been proposed to infer cell-type fractions from tissue-level data. However, these methods produce vastly different results under various real data settings. Simulation-based benchmarking studies showed no universally best deconvolution approaches. There have been attempts of ensemble methods, but they only aggregate multiple single-cell references or reference-free deconvolution methods. Results To achieve a robust estimation of cellular fractions, we proposed EnsDeconv (Ensemble Deconvolution), which adopts CTS robust regression to synthesize the results from 11 single deconvolution methods, 10 reference datasets, 5 marker gene selection procedures, 5 data normalizations and 2 transformations. Unlike most benchmarking studies based on simulations, we compiled four large real datasets of 4937 tissue samples in total with measured cellular fractions and bulk gene expression from different tissues. Comprehensive evaluations demonstrated that EnsDeconv yields more stable, robust and accurate fractions than existing methods. We illustrated that EnsDeconv estimated cellular fractions enable various CTS downstream analyses such as differential fractions associated with clinical variables. We further extended EnsDeconv to analyze bulk DNA methylation data. Availability and implementation EnsDeconv is freely available as an R-package from https://github.com/randel/EnsDeconv. The RNA microarray data from the TRAUMA study are available and can be accessed in GEO (GSE36809). The demographic and clinical phenotypes can be shared on reasonable request to the corresponding authors. The RNA-seq data from the EVAPR study cannot be shared publicly due to the privacy of individuals that participated in the clinical research in compliance with the IRB approval at the University of Pittsburgh. The RNA microarray data from the FHS study are available from dbGaP (phs000007.v32.p13). The RNA-seq data from ROS study is downloaded from AD Knowledge Portal. Supplementary information Supplementary data are available at Bioinformatics online.

Funder

University of Pittsburgh Brain Institute

University of Pittsburgh Medical Center Competitive Medical Research Fund

National Institutes of Health’s

University of Pittsburgh Center for Research Computing through the resources provided

Publisher

Oxford University Press (OUP)

Subject

Computational Mathematics,Computational Theory and Mathematics,Computer Science Applications,Molecular Biology,Biochemistry,Statistics and Probability

Reference40 articles.

1. Digital cell quantification identifies global immune cell dynamics during influenza infection;Altboum;Mol. Syst. Biol,2014

2. Minfi: a flexible and comprehensive bioconductor package for the analysis of infinium DNA methylation microarrays;Aryee;Bioinformatics,2014

3. Benchmarking of cell type deconvolution pipelines for transcriptomics data;Avila Cobos;Nat. Commun,2020

4. Decompress: tissue compartment deconvolution of targeted mRNA expression panels using compressed sensing;Bhattacharya;Nucleic Acids Res,2021

5. SCDC: bulk gene expression deconvolution by multiple single-cell RNA sequencing references;Dong;Brief. Bioinform,2021

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3