GORetriever: reranking protein-description-based GO candidates by literature-driven deep information retrieval for protein function annotation

Author:

Yan Huiying1,Wang Shaojun1,Liu Hancheng1,Mamitsuka Hiroshi23,Zhu Shanfeng1456ORCID

Affiliation:

1. Institute of Science and Technology for Brain-Inspired Intelligence and MOE Frontiers Center for Brain Science, Fudan University , Shanghai 200433, China

2. Bioinformatics Center, Institute for Chemical Research, Kyoto University , Uji, Kyoto Prefecture 611-0011, Japan

3. Department of Computer Science, Aalto University , Espoo 00076, Finland

4. Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence (Fudan University), Ministry of Education , Shanghai, 200433, China

5. Shanghai Key Lab of Intelligent Information Processing and Shanghai Institute of Artificial Intelligence Algorithm, Fudan University , Shanghai, 200433, China

6. Zhangjiang Fudan International Innovation Center , Shanghai, 200433, China

Abstract

Abstract Summary The vast majority of proteins still lack experimentally validated functional annotations, which highlights the importance of developing high-performance automated protein function prediction/annotation (AFP) methods. While existing approaches focus on protein sequences, networks, and structural data, textual information related to proteins has been overlooked. However, roughly 82% of SwissProt proteins already possess literature information that experts have annotated. To efficiently and effectively use literature information, we present GORetriever, a two-stage deep information retrieval-based method for AFP. Given a target protein, in the first stage, candidate Gene Ontology (GO) terms are retrieved by using annotated proteins with similar descriptions. In the second stage, the GO terms are reranked based on semantic matching between the GO definitions and textual information (literature and protein description) of the target protein. Extensive experiments over benchmark datasets demonstrate the remarkable effectiveness of GORetriever in enhancing the AFP performance. Note that GORetriever is the key component of GOCurator, which has achieved first place in the latest critical assessment of protein function annotation (CAFA5: over 1600 teams participated), held in 2023–2024. Availability and implementation GORetriever is publicly available at https://github.com/ZhuLab-Fudan/GORetriever.

Funder

National Natural Science Foundation of China

Shanghai Municipal Science and Technology Major

ZJ Lab and Shanghai Center for Brain Science

Brain-Inspired Intelligence Technology

MEXT KAKENHI

Academy of Finland

Publisher

Oxford University Press (OUP)

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3