UniProt-DAAC: domain architecture alignment and classification, a new method for automatic functional annotation in UniProtKB-Reference-Cited by-同舟云学术

UniProt-DAAC: domain architecture alignment and classification, a new method for automatic functional annotation in UniProtKB

Published:2016-03-07 Issue:15 Volume:32 Page:2264-2271
ISSN:1367-4803
Container-title:Bioinformatics
language:en
Short-container-title:

Author:

Doğan Tunca¹,MacDougall Alistair¹,Saidi Rabie¹,Poggioli Diego¹,Bateman Alex¹,O’Donovan Claire¹,Martin Maria J.¹

Affiliation:

1. European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridge, UK

Abstract

Abstract Motivation: Similarity-based methods have been widely used in order to infer the properties of genes and gene products containing little or no experimental annotation. New approaches that overcome the limitations of methods that rely solely upon sequence similarity are attracting increased attention. One of these novel approaches is to use the organization of the structural domains in proteins. Results: We propose a method for the automatic annotation of protein sequences in the UniProt Knowledgebase (UniProtKB) by comparing their domain architectures, classifying proteins based on the similarities and propagating functional annotation. The performance of this method was measured through a cross-validation analysis using the Gene Ontology (GO) annotation of a sub-set of UniProtKB/Swiss-Prot. The results demonstrate the effectiveness of this approach in detecting functional similarity with an average F-score: 0.85. We applied the method on nearly 55.3 million uncharacterized proteins in UniProtKB/TrEMBL resulted in 44 818 178 GO term predictions for 12 172 114 proteins. 22% of these predictions were for 2 812 016 previously non-annotated protein entries indicating the significance of the value added by this approach. Availability and implementation: The results of the method are available at: ftp://ftp.ebi.ac.uk/pub/contrib/martin/DAAC/. Contact: tdogan@ebi.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online.

Funder

The Scientific and Technological Research Council of Turkey, Post-doctoral Research Fellowship Program

Publisher

Oxford University Press (OUP)

Subject

Computational Mathematics,Computational Theory and Mathematics,Computer Science Applications,Molecular Biology,Biochemistry,Statistics and Probability

Link

http://academic.oup.com/bioinformatics/article-pdf/32/15/2264/39715820/bioinformatics_32_15_2264.pdf

Reference35 articles.

1. Basic local alignment search tool;Altschul;J. Mol. Biol,1990

2. MEME SUITE: tools for motif discovery and searching;Bailey;Nucleic Acids Res,2009

3. The generation of new protein functions by the combination of domains;Bashton;Structure,2007

4. GenBank;Benson;Nucleic Acids Res,2008

5. Domain rearrangements in protein evolution;Björklund;J. Mol. Biol,2005

Cited by 39 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Novel anti-Acanthamoeba effects elicited by a repurposed poly (ADP-ribose) polymerase inhibitor AZ9482;Frontiers in Cellular and Infection Microbiology;2024-05-28

2. Mutual annotation‐based prediction of protein domain functions with Domain2GO;Protein Science;2024-05-16

3. RNAseq analysis of Cellvibrio japonicus during starch utilization differentiates between genes encoding carbohydrate active enzymes controlled by substrate detection or growth rate;Microbiology Spectrum;2023-12-12

4. Feature architecture aware phylogenetic profiling indicates a functional diversification of type IVa pili in the nosocomial pathogen Acinetobacter baumannii;PLOS Genetics;2023-07-27

5. An O2-sensing diguanylate cyclase broadly affects the aerobic transcriptome in the phytopathogen Pectobacterium carotovorum;Frontiers in Microbiology;2023-07-07