Affiliation:
1. Department of Health Technology, Section for Bioinformatics, Technical University of Denmark
Abstract
Predicting the interaction between Major Histocompatibility Complex (MHC) class I-presented peptides and T-cell receptors (TCR) holds significant implications for vaccine development, cancer treatment, and autoimmune disease therapies. However, limited paired-chain TCR data, skewed towards well-studied epitopes, hampers the development of pan-specific machine-learning (ML) models. Leveraging a larger peptide-TCR dataset, we explore various alterations to the ML architectures and training strategies to address data imbalance. This leads to an overall improved performance, particularly for peptides with scant TCR data. However, challenges persist for unseen peptides, especially those distant from training examples. We demonstrate that such ML models can be used to detect potential outliers, which when removed from training, leads to augmented performance. Integrating pan-specific and peptide-specific models alongside with similarity-based predictions, further improves the overall performance, especially when a low false positive rate is desirable. In the context of the IMMREP22 benchmark, this modeling framework attained state-of-the-art performance. Moreover, combining these strategies results in acceptable predictive accuracy for peptides characterized with as little as 15 positive TCRs. This observation places great promise on rapidly expanding the peptide covering of the current models for predicting TCR specificity. The NetTCR 2.2 model incorporating these advances is available on GitHub (https://github.com/mnielLab/NetTCR-2.2) and as a web server at https://services.healthtech.dtu.dk/services/NetTCR-2.2/.
Funder
Inno4Vac
National Institute of Allergy and Infectious Diseases
Publisher
eLife Sciences Publications, Ltd
Reference34 articles.
1. A new way of exploring immunity - linking highly multiplexed antigen recognition to immune Repertoire and Phenotype;10x Genomics,2020
2. VDJdb in 2019: database extension, new analysis infrastructure and a T-cell receptor motif compendium;Bagaev;Nucleic Acids Research,2020
3. Keras;Chollet,2015
4. T-cell antigen receptor genes and T-cell recognition;Davis;Nature,1988
5. ANARCI: antigen receptor numbering and receptor classification;Dunbar;Bioinformatics,2016