Author:
McDermaid Adam,Chen Xin,Zhang Yiran,Xie Juan,Wang Cankun,Ma Qin
Abstract
AbstractMotivationOne of the main benefits of using modern RNA-sequencing (RNA-Seq) technology is the more accurate gene expression estimations compared with previous generations of expression data, such as the microarray. However, numerous issues can result in the possibility that an RNA-Seq read can be mapped to multiple locations on the reference genome with the same alignment scores, which occurs in plant, animal, and metagenome samples. Such a read is so-called a multiple-mapping read (MMR). The impact of these MMRs is reflected in gene expression estimation and all downstream analyses, including differential gene expression, functional enrichment, etc. Current analysis pipelines lack the tools to effectively test the reliability of gene expression estimations, thus are incapable of ensuring the validity of all downstream analyses.ResultsOur investigation into 95 RNA-Seq datasets from seven species (totaling 1,951GB) indicates an average of roughly 22% of all reads are MMRs for plant and animal species. Here we present a tool called GeneQC (Gene expression Quality Control), which can accurately estimate the reliability of each gene’s expression level. The underlying algorithm is designed based on extracted genomic and transcriptomic features, which are then combined using elastic-net regularization and mixture model fitting to provide a clearer picture of mapping uncertainty for each gene. GeneQC allows researchers to determine reliable expression estimations and conduct further analysis on the gene expression that is of sufficient quality. This tool also enables researchers to investigate continued re-alignment methods to determine more accurate gene expression estimates for those with low reliability.AvailabilityGeneQC is freely available at http://bmbl.sdstate.edu/GeneQC/home.html.Contactqin.ma@sdstate.eduSupplementary informationSupplementary data are available at Bioinformatics online.
Publisher
Cold Spring Harbor Laboratory
Reference49 articles.
1. Anders, S. and Huber, W. Differential expression of RNA-Seq data at the gene level–the DESeq package. Heidelberg, Germany: European Molecular Biology Laboratory (EMBL) 2012.
2. HTSeq--a Python framework to work with high-throughput sequencing data
3. Andrews, S. FastQC: a quality control tool for high throughput sequence data. 2010.
4. Simulation-based comprehensive benchmarking of RNA-seq aligners;Nature methods,2017
5. ContextMap 2: fast and accurate context-based RNA-seq mapping;BMC bioinformatics,2015
Cited by
3 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献