Protein Language Models Expose Viral Mimicry and Immune Escape-Reference-Cited by-同舟云学术

Protein Language Models Expose Viral Mimicry and Immune Escape

Published:2024-03-15 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Ofer Dan^ORCID,Linial Michal^ORCID

Abstract

AbstractMotivationViruses elude the immune system through molecular mimicry, adopting biophysical characteristics of their host. We adapt protein language models (PLMs) to differentiate between human and viral proteins. Understanding where the immune system and our models make mistakes could reveal viral immune escape mechanisms.ResultsWe applied pretrained deep-learning PLMs to predict viral from human proteins. Our predictors show state-of-the-art results with AUC of 99.7%. We use interpretable error analysis models to characterize viral escapers. Altogether, mistakes account for 3.9% of the sequences with viral proteins being disproportionally misclassified. Analysis of external variables, including taxonomy and functional annotations, indicated that errors typically involve proteins with low immunogenic potential, viruses specific to human hosts, and those using reverse-transcriptase enzymes for their replication. Viral families causing chronic infections and immune evasion are further enriched and their protein mimicry potential is discussed. We provide insights into viral adaptation strategies and highlight the combined potential of PLMs and explainable AI in uncovering mechanisms of viral immune escape, contributing to vaccine design and antiviral research.Availability and implementationData and results available inhttps://github.com/ddofer/ProteinHumVir.Contactmichall@cc.huji.ac.il

Publisher

Cold Spring Harbor Laboratory

Reference34 articles.

1. Viral adaptation to host: a proteome‐based analysis of codon usage and amino acid preferences

2. Begum, S. , et al. Molecular Mimicry Analyses Unveiled the Human Herpes Simplex and Poxvirus Epitopes as Possible Candidates to Incite Autoimmunity. Pathogens 2022;11(11).

3. Genome-wide prediction of disease variant effects with a deep protein language model;Nat Genet,2023

4. Gene overlapping and size constraints in the viral world;Biology direct,2016

5. ProteinBERT: a universal deep-learning model of protein sequence and function;Bioinformatics,2022