Abstract
AbstractSingle-cell sequencing technology has enabled correlation analysis of genomic features at the cellular level. However, high levels of noise and sparsity in single-cell sequencing data make accurate assessment of correlations challenging. This study provides a toolkit, SCSC (https://github.com/thecailab/SCSC), for the estimation of correlation coefficients in single-cell sequencing data. It comprehensively assessed four strategies (classical, non-zero, dropout-weighted, imputation) and the impact of data features in various simulated scenarios. The study found that filtering zeros significantly improves estimation accuracy, and further improvement can be achieved by considering the drop-out probability. In addition, the study also identified data features including expression level, library size, and biological variations that affect correlation estimation.
Publisher
Cold Spring Harbor Laboratory
Reference6 articles.
1. SCRIP: an accurate simulator for single-cell RNA sequencing data;Bioinformatics,2022
2. Single-Cell Transcriptome Profiling of Human Pancreatic Islets in Health and Type 2 Diabetes
3. Bailey, P. , et al., Weighted and Unweighted Correlation Methods for Large-Scale Educational Assessment: wCorr Formulas. AIR--NAEP Working Paper No. 2018-01. NCES Data R Project Series# 02. American Institutes for Research, 2018.
4. SAVER: gene expression recovery for single-cell RNA sequencing;Nat Methods,2018
5. Dropout imputation and batch effect correction for single-cell RNA sequencing data;Journal of Bio-X Research,2019