Abstract
BackgroundEukaryotic genes are often composed of multiple exons that are stitched together bysplicingout the intervening introns. These exons may be conditionally joined in different combinations to produce a collection of related, but distinct, mRNA transcripts. For protein-coding genes, these products ofalternative splicinglead to production of related protein variants (isoforms) of a gene. Complete labeling of the protein-coding content of a eukaryotic genome requires discovery of mRNA encoding all isoforms, but it is impractical to enumerate all possible combinations of tissue, developmental stage, and environmental context; as a result, many true exons go unlabeled in genome annotations.ResultsOne way to address the combinatoric challenge of finding all isoforms in a single organismAis to leverage sequencing efforts for other organisms – each time a new organism is sequenced, it may be under a new combination of conditions, so that a previously unobserved isoform may be sequenced. We presentDiviner, a software tool that identifies previously undocumented exons in organisms by comparing isoforms across species. We demonstrateDiviner’s utility by locating hundreds of novel exons in the genomes of human, mouse, and rat, as well as in the ferret genome. Further, we provide analyses supporting the notion that most of the new exons reported byDivinerare likely to be part of a true (but unobserved) isoform of the containing species.
Publisher
Cold Spring Harbor Laboratory