TSEBRA: transcript selector for BRAKER-Reference-Cited by-同舟云学术

TSEBRA: transcript selector for BRAKER

Published:2021-11-25 Issue:1 Volume:22 Page:
ISSN:1471-2105
Container-title:BMC Bioinformatics
language:en
Short-container-title:BMC Bioinformatics

Author:

Gabriel Lars,Hoff Katharina J.,Brůna Tomáš,Borodovsky Mark,Stanke Mario^ORCID

Abstract

Abstract Background BRAKER is a suite of automatic pipelines, BRAKER1 and BRAKER2, for the accurate annotation of protein-coding genes in eukaryotic genomes. Each pipeline trains statistical models of protein-coding genes based on provided evidence and, then predicts protein-coding genes in genomic sequences using both the extrinsic evidence and statistical models. For training and prediction, BRAKER1 and BRAKER2 incorporate complementary extrinsic evidence: BRAKER1 uses only RNA-seq data while BRAKER2 uses only a database of cross-species proteins. The BRAKER suite has so far not been able to reliably exceed the accuracy of BRAKER1 and BRAKER2 when incorporating both types of evidence simultaneously. Currently, for a novel genome project where both RNA-seq and protein data are available, the best option is to run both pipelines independently, and to pick one, likely better output. Therefore, one or another type of the extrinsic evidence would remain unexploited. Results We present TSEBRA, a software that selects gene predictions (transcripts) from the sets generated by BRAKER1 and BRAKER2. TSEBRA uses a set of rules to compare scores of overlapping transcripts based on their support by RNA-seq and homologous protein evidence. We show in computational experiments on genomes of 11 species that TSEBRA achieves higher accuracy than either BRAKER1 or BRAKER2 running alone and that TSEBRA compares favorably with the combiner tool EVidenceModeler. Conclusion TSEBRA is an easy-to-use and fast software tool. It can be used in concert with the BRAKER pipeline to generate a gene prediction set supported by both RNA-seq and homologous protein evidence.

Funder

National Institutes of Health

Universität Greifswald

Publisher

Springer Science and Business Media LLC

Subject

Applied Mathematics,Computer Science Applications,Molecular Biology,Biochemistry,Structural Biology

Link

https://link.springer.com/content/pdf/10.1186/s12859-021-04482-0.pdf

Reference45 articles.

1. Sayers EW, Cavanaugh M, Clark K, Pruitt KD, Schoch CL, Sherry ST, et al. GenBank. Nucleic Acids Res. 2021;49(D1):D92–6.

2. National Center for Biotechnology Information (NCBI). GenBank eukayotic genome reports; 2021. Accessed 01 May 2021. https://ftp.ncbi.nlm.nih.gov/genomes/GENOME_REPORTS/.

3. National Center for Biotechnology Information (NCBI). Eukaryotic Genome Annotation at NCBI; 2021. Accessed 01 May 2021. https://www.ncbi.nlm.nih.gov/genome/annotation_euk/.

4. Wang Z, Gerstein M, Snyder M. RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet. 2009;10(1):57–63.

5. Gremme G. Computational gene structure prediction [dissertation]. Staats-und Universitätsbibliothek Hamburg Carl von Ossietzky; 2012.

Cited by 133 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Genomes of diverse Actinidia species provide insights into cis-regulatory motifs and genes associated with critical traits;BMC Biology;2024-09-11

2. The first Chromosomal-level genome assembly of Sageretia thea using Nanopore long reads and Pore-C technology;Scientific Data;2024-09-04

3. Highly contiguous genome assembly and gene annotation of the short-finned eel (Anguilla bicolor pacifica);Scientific Data;2024-08-30

4. Chromosome-level genome assembly and methylome profile enables insights for the conservation of endangered loggerhead sea turtles;2024-08-28

5. Innovations in Alginate Catabolism Leading to Heterotrophy and Adaptive Evolution of Diatoms;2024-08-28