Hierarchical deep learning for predicting GO annotations by integrating protein knowledge

Author:

Merino Gabriela A123ORCID,Saidi Rabie3,Milone Diego H2ORCID,Stegmayer Georgina2ORCID,Martin Maria J3ORCID

Affiliation:

1. Bioengineering and Bioinformatics Research and Development Institute (IBB), FI-UNER, CONICET , Oro Verde 3100, Argentina

2. Research Institute for Signals, Systems and Computational Intelligence (sinc(i)), FICH-UNL, CONICET, Ciudad Universitaria UNL , Santa Fe 3000, Argentina

3. European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus , Cambridge CB101SD, UK

Abstract

Abstract Motivation Experimental testing and manual curation are the most precise ways for assigning Gene Ontology (GO) terms describing protein functions. However, they are expensive, time-consuming and cannot cope with the exponential growth of data generated by high-throughput sequencing methods. Hence, researchers need reliable computational systems to help fill the gap with automatic function prediction. The results of the last Critical Assessment of Function Annotation challenge revealed that GO-terms prediction remains a very challenging task. Recent developments on deep learning are significantly breaking out the frontiers leading to new knowledge in protein research thanks to the integration of data from multiple sources. However, deep models hitherto developed for functional prediction are mainly focused on sequence data and have not achieved breakthrough performances yet. Results We propose DeeProtGO, a novel deep-learning model for predicting GO annotations by integrating protein knowledge. DeeProtGO was trained for solving 18 different prediction problems, defined by the three GO sub-ontologies, the type of proteins, and the taxonomic kingdom. Our experiments reported higher prediction quality when more protein knowledge is integrated. We also benchmarked DeeProtGO against state-of-the-art methods on public datasets, and showed it can effectively improve the prediction of GO annotations. Availability and implementation DeeProtGO and a case of use are available at https://github.com/gamerino/DeeProtGO. Supplementary information Supplementary data are available at Bioinformatics online.

Funder

ANPCyT

UNL

UNER

the CABANA project-BBSRC

European Molecular Biology Laboratory core funds

Publisher

Oxford University Press (OUP)

Subject

Computational Mathematics,Computational Theory and Mathematics,Computer Science Applications,Molecular Biology,Biochemistry,Statistics and Probability

Reference30 articles.

1. Basic local alignment search tool;Altschul;J. Mol. Biol,1990

2. Machine learning techniques for protein function prediction;Bonetta;Proteins,2020

3. TALE: transformer-based protein function Annotation with joint sequence–Label Embedding;Cao;Bioinformatics,2021

4. Fast and accurate deep network learning by exponential linear units (ELUs);Clevert,2016

5. Protein function prediction;Cruz,2017

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3