Affiliation:
1. Central South University School of Computer Science and Engineering, , Changsha 410083 , China
Abstract
AbstractSingle-cell RNA-sequencing technology (scRNA-seq) brings research to single-cell resolution. However, a major drawback of scRNA-seq is large sparsity, i.e. expressed genes with no reads due to technical noise or limited sequence depth during the scRNA-seq protocol. This phenomenon is also called ‘dropout’ events, which likely affect downstream analyses such as differential expression analysis, the clustering and visualization of cell subpopulations, cellular trajectory inference, etc. Therefore, there is a need to develop a method to identify and impute these dropout events. We propose Bubble, which first identifies dropout events from all zeros based on expression rate and coefficient of variation of genes within cell subpopulation, and then leverages an autoencoder constrained by bulk RNA-seq data to only impute those values. Unlike other deep learning-based imputation methods, Bubble fuses the matched bulk RNA-seq data as a constraint to reduce the introduction of false positive signals. Using simulated and several real scRNA-seq datasets, we demonstrate that Bubble enhances the recovery of missing values, gene-to-gene and cell-to-cell correlations, and reduces the introduction of false positive signals. Regarding some crucial downstream analyses of scRNA-seq data, Bubble facilitates the identification of differentially expressed genes, improves the performance of clustering and visualization, and aids the construction of cellular trajectory. More importantly, Bubble provides fast and scalable imputation with minimal memory usage.
Funder
Hunan Postgraduate Research and Innovation Project
National Natural Science Foundation of China
Publisher
Oxford University Press (OUP)
Subject
Molecular Biology,Information Systems
Cited by
6 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献