Abstract
AbstractThe evolution of RNA-seq technologies has yielded datasets of high scientific value that are often generated as condition associated biological replicates within differential expression studies. As the number of replicates increase, so to does confidence in identifying differentially expressed transcripts. With rapidly expanding RNA-seq data archives there is opportunity to augment replicate numbers when conditions of interest overlap at an inter-study level. Despite correction procedures for estimating transcript abundance, a remaining source of error is transcript level intra-condition count variation; as partially indicated by the disjointed results between differential expression analysis tools. Such variation is amplified at an inter-study level. Here, we present TVscript, a tool that removes reference-based transcripts associated with intra-condition variation above specified thresholds. With this tool we explore the effects of removing transcripts associated with varying degrees of intra-condition variation on differential expression analysis. This is done in relation to inter- and intra-study datasets representing brain samples of dogs, wolves and foxes (wolves vs. dogs and aggressive vs. tame foxes). We demonstrate that 20% of the transcripts identified as being differentially expressed are associated with high levels of intra-condition variation. This is an over-representation relative to the reference set. As transcripts harbouring such variation are removed from the reference prior to differential expression analysis a discordance from 26 to 40% in the lists of differentially expressed transcripts is observed when compared to those obtained using the non-filtered reference. For our data, the removal of transcripts possessing intra-condition variation values within (and above) the 97th and 95th percentiles, for wolves vs. dogs and aggressive vs. tame foxes, maximized the detection of differentially expressed transcripts as a result of alterations within gene-wise dispersion estimates. Through this analysis the support for seven genes with potential for being involved with selection for tameness is provided. TVscript is available at: https://sourceforge.net/projects/tvscript/.
Publisher
Cold Spring Harbor Laboratory