Abstract
Background: Human leukocyte antigen (HLA) alleles are critical components of the immune system’s ability to recognize and eliminate tumors and infections. A large number of machine learning-based major histocompatibility complex (MHC) binding affinity (BA) prediction tools have been developed and are widely used for both investigational and therapeutic applications, so it is important to explore differences in tool outputs. Methods: We examined predictions of four popular tools (netMHCpan, HLAthena, MHCflurry, and MHCnuggets) across a range of possible peptide sources (human, viral, and randomly generated) and MHC class I alleles. Results: We uncovered inconsistencies in predictions of BA, allele promiscuity and the relationship between physical properties of peptides by source and BA predictions, as well as quality of training data. We found amount of training data does not explain inconsistencies between tools and yet for all tools, predicted binding quantities are similar between human and viral proteomes. Lastly, we find peptide physical properties are associated with allele-specific binding predictions. Conclusions: Our work raises fundamental questions about the fidelity of peptide-MHC binding prediction tools and their real-world implications. The real-world use of these prediction tools for theoretical binding of peptides to alleles is worrying, as the range of allele promiscuity is substantial yet does not differentiate between potential foreign versus self-antigens. Evaluating more viruses – as well as bacteria, fungi, and other pathogens – and linking these analyses with metrics such as evolutionary distance may give greater insight into the relationship between HLA evolution and disease.
Subject
General Pharmacology, Toxicology and Pharmaceutics,General Immunology and Microbiology,General Biochemistry, Genetics and Molecular Biology,General Medicine