Abstract
Comprehensive analysis of single-cell RNA sequencing (scRNA-seq) data can enhance our understanding of cellular diversity and aid in the development of personalized therapies for individuals. The abundance of missing values, known as dropouts, makes the analysis of scRNA-seq data a challenging task. Most traditional methods made assumptions about specific distributions for missing values, which limit their capability to capture the intricacy of high-dimensional scRNA-seq data. Moreover, the imputation performance of traditional methods decreases with higher missing rates. We propose a novel f-divergence based generative adversarial imputation method, called sc-fGAIN, for the scRNA-seq data imputation. Our studies identify four f-divergence functions, namely cross-entropy, Kullback-Leibler (KL), reverse KL, and Jensen-Shannon, that can be effectively integrated with the generative adversarial imputation network to generate imputed values without any assumptions, and mathematically prove that the distribution of imputed data using sc-fGAIN algorithm is same as the distribution of original data. Real scRNA-seq data analysis has shown that, compared to many traditional methods, the imputed values generated by sc-fGAIN algorithm have a smaller root-mean-square error, and it is robust to varying missing rates, moreover, it can reduce imputation variability. The flexibility offered by the f-divergence allows the sc-fGAIN method to accommodate various types of data, making it a more universal approach for imputing missing values of scRNA-seq data.
Funder
National Institutes of Health
intramural President’s Research Funds
Publisher
Public Library of Science (PLoS)
Reference49 articles.
1. CEL-Seq2-Single-cell RNA sequencing by multiplexed linear amplification;I Yanai;Single Cell Methods: Sequencing and Proteomics,2019
2. Massively parallel digital transcriptional profiling of single cells;GX Zheng;Nature communications,2017
3. Single-cell genomics identifies cell type–specific molecular changes in autism;D Velmeshev;Science,2019
4. Estimation of genetic networks and functional structures between genes by using BN and nonparametric regression;S Imoto;Pacific symposium on Biocomputing,2002
5. Inferring gene networks from time series microarray data using dynamic Bayesian networks;S Kim;Briefings in Bioinformatics,2003
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献