Incorporating RNA-seq data into the zebrafish Ensembl genebuild-Reference-Cited by-同舟云学术

Incorporating RNA-seq data into the zebrafish Ensembl genebuild

Published:2012-07-12 Issue:10 Volume:22 Page:2067-2078
ISSN:1088-9051
Container-title:Genome Research
language:en
Short-container-title:Genome Res.

Author:

Collins John E.,White Simon,Searle Stephen M.J.,Stemple Derek L.

Abstract

Ensembl gene annotation provides a comprehensive catalog of transcripts aligned to the reference sequence. It relies on publicly available species-specific and orthologous transcripts plus their inferred protein sequence. The accuracy of gene models is improved by increasing the species-specific component that can be cost-effectively achieved using RNA-seq. Two zebrafish gene annotations are presented in Ensembl version 62 built on the Zv9 reference sequence. Firstly, RNA-seq data from five tissues and seven developmental stages were assembled into 25,748 gene models. A 3′-end capture and sequencing protocol was developed to predict the 3′ ends of transcripts, and 46.1% of the original models were subsequently refined. Secondly, a standard Ensembl genebuild, incorporating carefully filtered elements from the RNA-seq-only build, followed by a merge with the manually curated VEGA database, produced a comprehensive annotation of 26,152 genes represented by 51,569 transcripts. The RNA-seq-only and the Ensembl/VEGA genebuilds contribute contrasting elements to the final genebuild. The RNA-seq genebuild was used to adjust intron/exon boundaries of orthologous defined models, confirm their expression, and improve 3′ untranslated regions. Importantly, the inferred protein alignments within the Ensembl genebuild conferred proof of model contiguity for the RNA-seq models. The zebrafish gene annotation has been enhanced by the incorporation of RNA-seq data and the pipeline will be used for other organisms. Organisms with little species-specific cDNA data will generally benefit the most.

Publisher

Cold Spring Harbor Laboratory

Subject

Genetics (clinical),Genetics

Reference31 articles.

1. Accurate whole human genome sequencing using reversible terminator chemistry

2. Stem cell transcriptome profiling via massive-scale mRNA sequencing

3. The Ensembl Automatic Gene Annotation System

4. Annotating genomes with massive-scale RNA sequencing

5. Ensembl 2011

Cited by 90 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. A Hierarchical and Disentangling Interest Learning Framework for Unbiased and True News Recommendation;Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining;2024-08-24

2. The Genetics of Sleep in Zebrafish;Genetics of Sleep and Sleep Disorders;2024

3. Discovering microproteins: making the most of ribosome profiling data;RNA Biology;2023-11-27

4. Zebrafish as an Animal Model for Albinism Disorders;Annals of the Academy of Romanian Scientists Series on Biological Sciences;2023-10-30

5. Zebrafish: A smart tool for heart disease research;Journal of Fish Biology;2023-10-24