AnnotaPipeline: An integrated tool to annotate eukaryotic proteins using multi-omics data-Reference-Cited by-同舟云学术

AnnotaPipeline: An integrated tool to annotate eukaryotic proteins using multi-omics data

Published:2022-11-22 Issue: Volume:13 Page:
ISSN:1664-8021
Container-title:Frontiers in Genetics
language:
Short-container-title:Front. Genet.

Author:

Maia Guilherme Augusto,Filho Vilmar Benetti,Kawagoe Eric Kazuo,Teixeira Soratto Tatiany Aparecida,Moreira Renato Simões,Grisard Edmundo Carlos,Wagner Glauber

Abstract

Assignment of gene function has been a crucial, laborious, and time-consuming step in genomics. Due to a variety of sequencing platforms that generates increasing amounts of data, manual annotation is no longer feasible. Thus, the need for an integrated, automated pipeline allowing the use of experimental data towards validation of in silico prediction of gene function is of utmost relevance. Here, we present a computational workflow named AnnotaPipeline that integrates distinct software and data types on a proteogenomic approach to annotate and validate predicted features in genomic sequences. Based on FASTA (i) nucleotide or (ii) protein sequences or (iii) structural annotation files (GFF3), users can input FASTQ RNA-seq data, MS/MS data from mzXML or similar formats, as the pipeline uses both transcriptomic and proteomic information to corroborate annotations and validate gene prediction, providing transcription and expression evidence for functional annotation. Reannotation of the available Arabidopsis thaliana, Caenorhabditis elegans, Candida albicans, Trypanosoma cruzi, and Trypanosoma rangeli genomes was performed using the AnnotaPipeline, resulting in a higher proportion of annotated proteins and a reduced proportion of hypothetical proteins when compared to the annotations publicly available for these organisms. AnnotaPipeline is a Unix-based pipeline developed using Python and is available at: https://github.com/bioinformatics-ufsc/AnnotaPipeline.

Publisher

Frontiers Media SA

Subject

Genetics (clinical),Genetics,Molecular Medicine

Reference37 articles.

1. VEuPathDB: The eukaryotic pathogen, vector and host bioinformatics resource center;Amos;Nucleic Acids Res.,2022

2. Near-optimal probabilistic RNA-seq quantification;Bray;Nat. Biotechnol.,2016

3. GeneMark-EP+: Eukaryotic gene prediction with self-training in the space of genes and proteins;Brůna;Nar. Genom. Bioinform.,2020

4. Sensitive protein alignments at tree-of-life scale using DIAMOND;Buchfink;Nat. Methods,2021

5. BLAST+: Architecture and applications;Camacho;BMC Bioinforma.,2009

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Shotgun proteomics of detergent-solubilized proteins from Trypanosoma evansi;Journal of Proteomics;2024-07