Abstract
Protein therapeutics already have an arsenal of applications that include disrupting protein interactions, acting as potent vaccines, and replacing genetically deficient proteins. Therapeutics must avoid triggering unwanted immune-responses towards the therapeutic protein or viral vector proteins. In contrast, vaccines must support a robust immune-reaction targeting a broad range of pathogen variants. Therefore, computational methods modifying proteins’ immunogenicity without disrupting function are needed. While many components of the immune-system can be involved in a reaction, we focus on Cytotoxic T-lymphocytess (CTLs). These target short peptides presented via the MHC Class I (MHC-I) pathway. To explore the limits of modifying the visibility of those peptides to CTL within the distribution of naturally occurring sequences, we developed a novel machine learning technique,CAPE-XVAE. It combines a language model with reinforcement learning to modify a protein’s immune-visibility. Our results show thatCAPE-XVAEeffectively modifies the visibility of the HIV Nef protein to CTL. We contrastCAPE-XVAEtoCAPE-Packer, a physics-based method we also developed. Compared toCAPE-Packer, the machine learning approach suggests sequences that draw upon local sequence similarities in the training set. This is beneficial for vaccine development, where the synthetic sequence should be representative of the real viral population. Additionally, the language model approach holds promise for preserving both known and unknown functional constraints, which are essential for the immune-modulation of therapeutic proteins. In contrast,CAPE-Packer, emphasizes preserving the protein’s overall fold and can reach greater extremes of immune-visibility, but falls short of capturing the sequence diversity of viral variants available to learn from.Source codehttps://github.com/hcgasser/CAPE(Tag: v1.1)
Publisher
Cold Spring Harbor Laboratory