Author:
Kushwaha Anjana,Duroux Patrice,Giudicelli Véronique,Todorov Konstantin,Kossida Sofia
Abstract
AbstractThe accurate prediction of peptide-MHC class I binding probabilities is a critical endeavor in immunoinformatics, with broad implications for vaccine development and immunotherapies. While recent deep neural network based approaches have showcased promise in peptide-MHC prediction, they have two shortcomings: (i) they rely on hand-crafted pseudo-sequence extraction, (ii) they do not generalise well to different datasets, which limits the practicality of these approaches. In this paper, we present PerceiverpMHC that is able to learn accurate representations on full-sequences by leveraging efficient transformer based architectures. Additionally, we propose IMGT/RobustpMHC that harnesses the potential of unlabeled data in improving the robustness of peptide-MHC binding predictions through a self-supervised learning strategy. We extensively evaluate RobustpMHC on 8 different datasets and showcase the improvements over the state-of-the-art approaches. Finally, we compile CrystalIMGT, a crystallography verified dataset that presents a challenge to existing approaches due to significantly different peptide-MHC distributions.
Publisher
Cold Spring Harbor Laboratory