False signals induced by single-cell imputation-Reference-Cited by-同舟云学术

False signals induced by single-cell imputation

Published:2018-11-02 Issue: Volume:7 Page:1740
ISSN:2046-1402
Container-title:F1000Research
language:en
Short-container-title:F1000Res

Author:

Andrews Tallulah S.^ORCID,Hemberg Martin

Abstract

Background: Single-cell RNASeq is a powerful tool for measuring gene expression at the resolution of individual cells. A significant challenge in the analysis of this data is the large amount of zero values, representing either missing data or no expression. Several imputation approaches have been proposed to deal with this issue, but since these methods generally rely on structure inherent to the dataset under consideration they may not provide any additional information. Methods: We evaluated the risk of generating false positive or irreproducible results when imputing data with five different methods. We applied each method to a variety of simulated datasets as well as to permuted real single-cell RNASeq datasets and consider the number of false positive gene-gene correlations and differentially expressed genes. Using matched 10X Chromium and Smartseq2 data from the Tabula Muris database we examined the reproducibility of markers before and after imputation. Results: The extent of false-positive signals introduced by imputation varied considerably by method. Data smoothing based methods, MAGIC and knn-smooth, generated a very high number of false-positives in both real and simulated data. Model-based imputation methods typically generated fewer false-positives but this varied greatly depending on how well datasets conformed to the underlying model. Furthermore, only SAVER exhibited reproducibility comparable to unimputed data across matched data. Conclusions: Imputation of single-cell RNASeq data introduces circularity that can generate false-positive results. Thus, statistical tests applied to imputed data should be treated with care. Additional filtering by effect size can reduce but not fully eliminate these effects. Of the methods we considered, SAVER was the least likely to generate false or irreproducible results, thus should be favoured over alternatives if imputation is necessary.

Funder

Wellcome Trust

Publisher

F1000 Research Ltd

Subject

General Pharmacology, Toxicology and Pharmaceutics,General Immunology and Microbiology,General Biochemistry, Genetics and Molecular Biology,General Medicine

Link

https://f1000research.com/articles/7-1740/v1/pdf

Reference25 articles.

1. Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments.;J Bullard;BMC Bioinformatics.,2010

2. A combined reference panel from the 1000 Genomes and UK10K projects improved rare variant imputation in European and Chinese samples.;W Chou;Sci Rep.,2016

3. Single-cell RNA-seq data from Smart-seq2 sequencing of FACS sorted cells.;figshare.,2017a

4. Single-cell RNA-seq data from microfluidic emulsion.;figshare.,2017b

5. Massive single-cell RNA-seq analysis and imputation via deep learning.;Y Deng;bioRxiv.,2018

Cited by 135 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. scGAAC: A graph attention autoencoder for clustering single-cell RNA-sequencing data;Methods;2024-09

2. From Cell to Gene: Deciphering the Mechanism of Heart Failure With Single‐Cell Sequencing;Advanced Science;2024-08-19

3. scRNMF: An imputation method for single-cell RNA-seq data by robust and non-negative matrix factorization;PLOS Computational Biology;2024-08-08

4. DeepIMAGER: Deeply Analyzing Gene Regulatory Networks from scRNA-seq Data;Biomolecules;2024-06-27

5. Branching topology of the human embryo transcriptome revealed by Entropy Sort Feature Weighting;Development;2024-06-01