How tool combinations in different pipeline versions affect the outcome in RNA-seq analysis-Reference-Cited by-同舟云学术

How tool combinations in different pipeline versions affect the outcome in RNA-seq analysis

Published:2023-10-06 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Perelo Louisa Wessels^ORCID,Gabernet Gisela^ORCID,Straub Daniel^ORCID,Nahnsen Sven^ORCID

Abstract

ABSTRACTData analysis tools are continuously changed and improved over time. In order to test how these changes influence the comparability between analyses, the output of different workflow options of the nf-core/rnaseq pipeline were compared. Five different pipeline settings (STAR+Salmon, STAR+RSEM, STAR+featureCounts, HiSAT+featureCounts, pseudoaligner Salmon) were run on three datasets (human, Arabidopsis, zebrafish) containing spike-ins of the External RNA Control Consortium (ERCC). Fold change ratios and differential expression of genes and spike-ins were used for comparative analyses of the different tools and versions settings of the pipeline. An overlap of 85% for differential gene classification between pipelines could be shown. Genes interpreted with a bias were mostly those present at lower concentration. Also, the number of isoforms and exons per gene were determinants. Previous pipeline versions using featureCounts showed a higher sensitivity to detect one-isoform genes like ERCC. To ensure data comparability in long-term analysis series it would be recommendable to either stay with the pipeline version the series was initialized with or to run both versions during a transition time in order to ensure that the target genes are addressed the same way.

Publisher

Cold Spring Harbor Laboratory

Reference31 articles.

1. The nf-core framework for community-curated bioinformatics pipelines

2. STAR: ultrafast universal RNA-seq aligner

3. Salmon provides fast and bias-aware quantification of transcript expression

4. featureCounts: an efficient general purpose program for assigning sequence reads to genomic features