Abstract
AbstractStructural docking between the adaptive immune receptors (AIRs), including T cell receptors (TCRs) and B cell receptors (BCRs), and their cognate antigens is one of the most fundamental processes in adaptive immunity. However, current methods for predicting AIR-antigen binding largely rely on sequence-derived features of AIRs, omitting the structure features that are essential for binding affinity. In this study, we present a deep-learning framework, termed DeepAIR, for the accurate prediction of AIR-antigen binding by integrating both sequence and structure features of AIRs. DeepAIR consists of three feature encoders (a trainable-embedding-layer-based gene encoder, a transformer-based sequence encoder, and a pre-trained AlphaFold2-based structure encoder), a gating-based attention mechanism to extract important features, and a tensor fusion mechanism to integrate obtained features. We train and evaluate DeepAIR on three downstream prediction tasks, including the prediction of AIR-antigen binding affinity, the prediction of AIR-antigen binding reactivity, and the classification of the immune repertoire. On five representative datasets, DeepAIR shows outstanding prediction performance in terms of AUC (area under the ROC curve) in predicting the binding reactivity to various antigens, as well as the classification of immune repertoire for nasopharyngeal carcinoma (NPC) and inflammatory bowel disease (IBD). DeepAIR is freely available for academic purposes at https://github.com/TencentAILabHealthcare/DeepAIR. We anticipate that DeepAIR can serve as a useful tool for characterizing and profiling antigen binding AIRs, thereby informing the design of personalized immunotherapy.HighlightsIntegrating predicted AIR structures using AlphaFold2 significantly improves the prediction accuracy of the binding reactivity between AIRs and antigens.DeepAIR is featured by a novel deep learning architecture that leverages both the gating-based attention mechanism and tensor fusion mechanism to effectively extract and integrate informative features from three feature encoders, including a trainable embedding-layer-based gene encoder, a transformer-based sequence encoder, and a pre-trained AlphaFold2-based structure encoder.DeepAIR is implemented as a biologically interpretable deep learning framework that highlights the key residues in both α and β chains that are critical for predicting the AIR-antigen binding.
Publisher
Cold Spring Harbor Laboratory