No one tool to rule them all: Prokaryotic gene prediction tool performance is highly dependent on the organism of study-Reference-Cited by-同舟云学术

No one tool to rule them all: Prokaryotic gene prediction tool performance is highly dependent on the organism of study

Published:2021-05-23 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Dimonaco Nicholas J.^ORCID,Aubrey Wayne^ORCID,Kenobi Kim^ORCID,Clare Amanda^ORCID,Creevey Christopher J.^ORCID

Abstract

AbstractMotivationThe biases in Open Reading Frame (ORF) prediction tools, which have been based on historic genomic annotations from model organisms, impact our understanding of novel genomes and metagenomes. This hinders the discovery of new genomic information as it results in predictions being biased towards existing knowledge. To date users have lacked a systematic and replicable approach to identify the strengths and weaknesses of any ORF prediction tool and allow them to choose the right tool for their analysis.ResultsWe present an evaluation framework (ORForise) based on a comprehensive set of 12 primary and 60 secondary metrics that facilitate the assessment of the performance of ORF prediction tools. This makes it possible to identify which performs better for specific use-cases. We use this to assess 15 ab initio and model-based tools representing those most widely used (historically and currently) to generate the knowledge in genomic databases. We find that the performance of any tool is dependent on the genome being analysed, and no individual tool ranked as the most accurate across all genomes or metrics analysed. Even the top-ranked tools produced conflicting gene collections which could not be resolved by aggregation. The ORForise evaluation framework provides users with a replicable, data-led approach to make informed tool choices for novel genome annotations and for refining historical annotations.Availability

https://github.com/NickJD/ORForise

Contactnicholas@dimonaco.co.ukSupplementary informationSupplementary data are available at bioRxiv online.

Publisher

Cold Spring Harbor Laboratory

Reference62 articles.

1. Emerging evidence for functional peptides encoded by short open reading frames

2. Augmented genetic decoding: global, local and temporal alterations of decoding processes and codon meaning

3. Bartholomaus, A. , Kolte, B. , Mustafayeva, A. , Goebel, I. , Fuchs, S. , Engelmann, S. , and Ignatova, Z. (2020). smORFer: a modular algorithm to detect small ORFs in prokaryotes. bioRxiv 2020.05.21.109181.

4. Heuristic approach to deriving models for gene finding

5. GeneMark: web software for gene finding in prokaryotes, eukaryotes and viruses

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Accurate and fast graph-based pangenome annotation and clustering with ggCaller;2023-01-24

2. FrameRate: learning the coding potential of unassembled metagenomic reads;2022-09-19