Event Analysis: Using Transcript Events To Improve Estimates of Abundance in RNA-seq Data

Author:

Newman Jeremy R B1,Concannon Patrick2,Tardaguila Manuel3,Conesa Ana4,McIntyre Lauren M1

Affiliation:

1. Department of Molecular Genetics and Microbiology and Genetics Institute, University of Florida, Gainesville, Florida

2. Genetics Institute and Department of Pathology, Immunology and Laboratory Medicine, University of Florida, Gainesville, Florida

3. Department of Microbiology and Cell Science, Institute for Food and Agricultural Sciences, University of Florida, Gainesville, Florida, 32610; Wellcome Trust Sanger Institute, Hinxton, United Kingdom

4. Department of Microbiology and Cell Science, Institute for Food and Agricultural Sciences; Genetics Institute, University of Florida, Gainesville, Florida, 32610; Genomics of Gene Expression Lab, Prince Felipe Research Center, Valencia, Spain

Abstract

Abstract Alternative splicing leverages genomic content by allowing the synthesis of multiple transcripts and, by implication, protein isoforms, from a single gene. However, estimating the abundance of transcripts produced in a given tissue from short sequencing reads is difficult and can result in both the construction of transcripts that do not exist, and the failure to identify true transcripts. An alternative approach is to catalog the events that make up isoforms (splice junctions and exons). We present here the Event Analysis (EA) approach, where we project transcripts onto the genome and identify overlapping/unique regions and junctions. In addition, all possible logical junctions are assembled into a catalog. Transcripts are filtered before quantitation based on simple measures: the proportion of the events detected, and the coverage. We find that mapping to a junction catalog is more efficient at detecting novel junctions than mapping in a splice aware manner. We identify 99.8% of true transcripts while iReckon identifies 82% of the true transcripts and creates more transcripts not included in the simulation than were initially used in the simulation. Using PacBio Iso-seq data from a mouse neural progenitor cell model, EA detects 60% of the novel junctions that are combinations of existing exons while only 43% are detected by STAR. EA further detects ∼5,000 annotated junctions missed by STAR. Filtering transcripts based on the proportion of the transcript detected and the number of reads on average supporting that transcript captures 95% of the PacBio transcriptome. Filtering the reference transcriptome before quantitation, results in is a more stable estimate of isoform abundance, with improved correlation between replicates. This was particularly evident when EA is applied to an RNA-seq study of type 1 diabetes (T1D), where the coefficient of variation among subjects (n = 81) in the transcript abundance estimates was substantially reduced compared to the estimation using the full reference. EA focuses on individual transcriptional events. These events can be quantitate and analyzed directly or used to identify the probable set of expressed transcripts. Simple rules based on detected events and coverage used in filtering result in a dramatic improvement in isoform estimation without the use of ancillary data (e.g., ChIP, long reads) that may not be available for many studies.

Publisher

Oxford University Press (OUP)

Subject

Genetics(clinical),Genetics,Molecular Biology

Reference84 articles.

1. RNA-seq analysis of impact of PNN on gene expression and alternative splicing in corneal epithelial cells.;Akin;Mol. Vis.,2016

2. Detecting differential usage of exons from RNA-seq data.;Anders;Genome Res.,2012

3. Computational approaches for isoform detection and estimation: good and bad news.;Angelini;BMC Bioinformatics,2014

4. Characterization of the human ESC transcriptome by hybrid sequencing.;Au;Proc. Natl. Acad. Sci. USA,2013

5. Genome-wide association study and meta-analysis find that over 40 loci affect risk of type 1 diabetes.;Barrett;Nat. Genet.,2009

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3