Author:
Alonso-Reyes Daniel Gonzalo,Albarracín Virginia Helena
Abstract
SUMMARYHigh□throughput sequencing advancements, high amounts of genomic data and huge protein databases for homology comparisons have yielded a need for accessible, accurate, and standardized analysis. Genomic annotation tools have revolutionized gene prediction and metabolic reconstruction, and have achieved state-of-the-art accuracy. Despite these advancements, there is no hierarchical pipeline to prioritize the annotation of certain valuable ortholog identifiers and enzyme commission numbers. Such data can be introduced in many tools for metabolic analysis, but currently their annotation is hindered by inaccurate workflows and meaningless information retrieval. This is true especially for those annotators based on very small or too-large protein databases. In this study, we provide a solution based on a hierarchical methodology and optimized databases which additionally overcome data download challenges. The herein presented software framework demonstrates highly optimized and accurate results, and provides an analysis pipeline that can accommodate different search algorithms across prokaryotic genomes, and metagenomes.
Publisher
Cold Spring Harbor Laboratory