Affiliation:
1. Department of Computer Science and Engineering, School of Information Science and Engineering, Yunnan University , Kunming 650504, China
2. Yunnan Key Laboratory of Intelligent Systems and Computing, Yunnan University , Kunming 650504, China
Abstract
Abstract
Motivation
Single-cell RNA sequencing (scRNA-seq) technology has enabled discovering gene expression patterns at single cell resolution. However, due to technical limitations, there are usually excessive zeros, called “dropouts,” in scRNA-seq data, which may mislead the downstream analysis. Therefore, it is crucial to impute these dropouts to recover the biological information.
Results
We propose a two-step imputation method called tsImpute to impute scRNA-seq data. At the first step, tsImpute adopts zero-inflated negative binomial distribution to discriminate dropouts from true zeros and performs initial imputation by calculating the expected expression level. At the second step, it conducts clustering with this modified expression matrix, based on which the final distance weighted imputation is performed. Numerical results based on both simulated and real data show that tsImpute achieves favorable performance in terms of gene expression recovery, cell clustering, and differential expression analysis.
Availability and implementation
The R package of tsImpute is available at https://github.com/ZhengWeihuaYNU/tsImpute.
Funder
National Natural Science Foundation of China
Yunnan Key Laboratory of Intelligent Systems and Computing
Yunnan Province Science Foundation
Research Foundation of the Education Department of Yunnan Province
Publisher
Oxford University Press (OUP)
Subject
Computational Mathematics,Computational Theory and Mathematics,Computer Science Applications,Molecular Biology,Biochemistry,Statistics and Probability