Variability in estimated gene expression among commonly used RNA-seq pipelines-Reference-Cited by-同舟云学术

Variability in estimated gene expression among commonly used RNA-seq pipelines

Published:2020-02-17 Issue:1 Volume:10 Page:
ISSN:2045-2322
Container-title:Scientific Reports
language:en
Short-container-title:Sci Rep

Author:

Arora Sonali,Pattwell Siobhan S.,Holland Eric C.^ORCID,Bolouri Hamid^ORCID

Abstract

AbstractRNA-sequencing data is widely used to identify disease biomarkers and therapeutic targets using numerical methods such as clustering, classification, regression, and differential expression analysis. Such approaches rely on the assumption that mRNA abundance estimates from RNA-seq are reliable estimates of true expression levels. Here, using data from five RNA-seq processing pipelines applied to 6,690 human tumor and normal tissues, we show that nearly 88% of protein-coding genes have similar gene expression profiles across all pipelines. However, for >12% of protein-coding genes, current best-in-class RNA-seq processing pipelines differ in their abundance estimates by more than four-fold when applied to exactly the same samples and the same set of RNA-seq reads. Expression fold changes are similarly affected. Many of the impacted genes are widely studied disease-associated genes. We show that impacted genes exhibit diverse patterns of discordance among pipelines, suggesting that many inter-pipeline differences contribute to overall uncertainty in mRNA abundance estimates. A concerted, community-wide effort will be needed to develop gold-standards for estimating the mRNA abundance of the discordant genes reported here. In the meantime, our list of discordantly evaluated genes provides an important resource for robust marker discovery and target selection.

Publisher

Springer Science and Business Media LLC

Subject

Multidisciplinary

Link

http://www.nature.com/articles/s41598-020-59516-z.pdf

Reference38 articles.

1. Cancer Genome Atlas Research, N. et al. The Cancer Genome Atlas Pan-Cancer analysis project. Nat. Genet. 45, 1113–1120 (2013).

2. Carithers, L. J. et al. A Novel Approach to High-Quality Postmortem Tissue Procurement: The GTEx Project. Biopreserv Biobank 13, 311–319 (2015).

3. Grossman, R. L. et al. Toward a Shared Vision for Cancer Genomic Data. N. Engl. J. Med. 375, 1109–1112 (2016).

4. Rahman, M. et al. Alternative preprocessing of RNA-Sequencing data in The Cancer Genome Atlas leads to improved analysis results. Bioinformatics 31, 3666–3672 (2015).

5. Papatheodorou, I. et al. Expression Atlas: gene and protein expression across multiple studies and organisms. Nucleic Acids Res 46, D246–D251 (2018).

Cited by 48 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. HOXD12 defines an age-related aggressive subtype of oligodendroglioma;Acta Neuropathologica;2024-09-11

2. Assessment of NanoString technology as a tool for profiling circulating miRNA in maternal blood during pregnancy;Extracellular Vesicles and Circulating Nucleic Acids;2024-09-05

3. Identifying the stability of housekeeping genes to be used for the quantitative real-time PCR normalization in retinal tissue of streptozotocin-induced diabetic rats;International Journal of Ophthalmology;2024-05-18

4. U-CAN-seq: A Universal Competition Assay by Nanopore Sequencing;Viruses;2024-04-19

5. Haploinsufficiency of phosphodiesterase 10A activates PI3K/AKT signaling independent of PTEN to induce an aggressive glioma phenotype;Genes & Development;2024-03-01