Regulatory network-based imputation of dropouts in single-cell RNA sequencing data

Author:

Leote Ana CarolinaORCID,Wu Xiaohui,Beyer AndreasORCID

Abstract

AbstractSingle-cell RNA sequencing (scRNA-seq) methods are typically unable to quantify the expression levels of all genes in a cell, creating a need for the computational prediction of missing values (‘dropout imputation’). Most existing dropout imputation methods are limited in the sense that they exclusively use the scRNA-seq dataset at hand and do not exploit external gene-gene relationship information. Further, it is unknown if all genes equally benefit from imputation or which imputation method works best for a given gene.Here, we show that a transcriptional regulatory network learned from external, independent gene expression data improves dropout imputation. Using a variety of human scRNA-seq datasets we demonstrate that our network-based approach outperforms published state-of-the-art methods. The network-based approach performs particularly well for lowly expressed genes, including cell-type-specific transcriptional regulators. Further, the cell-to-cell variation of 12.6% to 48.2% of the genes could not be adequately imputed by any of the methods that we tested. In those cases gene expression levels were best predicted by the mean expression across all cells, i.e. assuming no measurable expression variation between cells. These findings suggest that different imputation methods are optimal for different genes. We thus implemented an R-package called ADImpute (available via Bioconductor https://bioconductor.org/packages/release/bioc/html/ADImpute.html) that automatically determines the best imputation method for each gene in a dataset.Our work represents a paradigm shift by demonstrating that there is no single best imputation method. Instead, we propose that imputation should maximally exploit external information and be adapted to gene-specific features, such as expression level and expression variation across cells.Author summarySingle-cell RNA-sequencing (scRNA-seq) allows for gene expression to be quantified in individual cells and thus plays a critical role in revealing differences between cells within tissues and characterizing them in healthy and pathological conditions. Because scRNA-seq captures the RNA content of individual cells, lowly expressed genes, for which few RNA molecules are present in the cell, are easily missed. These events are called ‘dropouts’ and considerably hinder analysis of the resulting data. In this work, we propose to make use of gene-gene relationships, learnt from external and more complete datasets, to estimate the true expression of genes that could not be quantified in a given cell. We show that this approach generally outperforms previously published methods, but also that different genes are better estimated with different methods. To allow the community to use our proposed method and combine it with existing ones, we created the R package ADImpute, available through Bioconductor.

Publisher

Cold Spring Harbor Laboratory

Cited by 6 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3