Abstract
AbstractProtein function inference relies on annotating protein domains via sequence similarity, often modeled through profile Hidden Markov Models (profile HMMs), which capture evolutionary diversity within related domains. However, profile HMMs make strong simplifying independence assumptions when modeling residues in a sequence. Here, we introduce PSALM (Protein Sequence Annotation with Language Models), a hierarchical approach that relaxes these assumptions and uses representations of protein sequences learned by protein language models to enable high-sensitivity, high-specificity residue-level protein sequence annotation. We validate PSALM’s performance on a curated set of “ground truth” annotations determined by a profile HMM-based method and highlight PSALM as a promising alternative for protein sequence annotation.
Publisher
Cold Spring Harbor Laboratory
Reference39 articles.
1. An Introduction to Sequence Similarity (“Homology”) Searching;Current Protocols in Bioinformatics,2013
2. Profile hidden Markov models
3. UniProt: the Universal Protein Knowledgebase in 2023
4. EMBL-EBI. Protein Classification: What Are Protein Domains? https://www.ebi.ac. uk/training/online/courses/protein-classification-intro-ebi-resources/protein-classification/what-are-protein-domains/, 2024.
5. InterPro in 2022