FAPM: Functional Annotation of Proteins using Multi-Modal Models Beyond Structural Modeling-Reference-Cited by-同舟云学术

FAPM: Functional Annotation of Proteins using Multi-Modal Models Beyond Structural Modeling

Published:2024-05-10 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Xiang Wenkai,Xiong Zhaoping,Chen Huan,Xiong Jiacheng,Zhang Wei,Fu Zunyun,Zheng Mingyue^ORCID,Liu Bing,Shi Qian

Abstract

AbstractAssigning accurate property labels to proteins, like functional terms and catalytic activity, is challenging, especially for proteins without homologs and “tail labels” with few known examples. Unlike previous methods that mainly focused on protein sequence features, we use a pretrained large natural language model to understand the semantic meaning of protein labels. Specifically, we introduce FAPM, a contrastive multi-modal model that links natural language with protein sequence language. This model combines a pretrained protein sequence model with a pretrained large language model to generate labels, such as Gene Ontology (GO) functional terms and catalytic activity predictions, in natural language. Our results show that FAPM excels in understanding protein properties, outperforming models based solely on protein sequences or structures. It achieves state-of-the-art performance on public benchmarks and in-house experimentally annotated phage proteins, which often have few known homologs. Additionally, FAPM’s flexibility allows it to incorporate extra text prompts, like taxonomy information, enhancing both its predictive performance and explainability. This novel approach offers a promising alternative to current methods that rely on multiple sequence alignment for protein annotation. The online demo is at:https://huggingface.co/spaces/wenkai/FAPM_demo.

Publisher

Cold Spring Harbor Laboratory

Reference54 articles.

1. Highly accurate protein structure prediction with AlphaFold

2. AlphaFold Protein Structure Database: massively expanding the structural coverage of protein-sequence space with high-accuracy models

3. The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003

4. UniProt: the Universal Protein Knowledgebase in 2023

5. Gene Ontology: tool for the unification of biology

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. SciMind: A Multimodal Mixture-of-Experts Model for Advancing Pharmaceutical Sciences;2024-07-21