Do Domain-Specific Protein Language Models Outperform General Models on Immunology-Related Tasks?-Reference-Cited by-同舟云学术

Do Domain-Specific Protein Language Models Outperform General Models on Immunology-Related Tasks?

Published:2023-10-20 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Deutschmann Nicolas,Pelissier Aurelien^ORCID,Weber Anna,Gao Shuaijun,Bogojeska Jasmina,Martínez María Rodríguez^ORCID

Abstract

AbstractDeciphering the antigen recognition capabilities by T cell and B cell receptors (antibodies) is essential for advancing our understanding of adaptive immune system responses. In recent years, the development of protein language models (PLMs) has facilitated the development of bioinformatic pipelines where complex amino acid sequences are transformed into vectorized embeddings, which are then applied to a range of downstream analytical tasks. With their success, we have witnessed the emergence of domain-specific PLMs tailored to specific proteins, such as immune receptors. Domain-specific models are often assumed to possess enhanced representation capabilities for targeted applications, however, this assumption has not been thoroughly evaluated. In this manuscript, we assess the efficacy of both generalist and domain-specific transformer-based embeddings in characterizing B and T cell receptors. Specifically, we assess the accuracy of models that leverage these embeddings to predict antigen specificity and elucidate the evolutionary changes that B cells undergo during an immune response. We demonstrate that the prevailing notion of domain-specific models outperforming general models requires a more nuanced examination. We also observe remarkable differences between generalist and domain-specific PLMs, not only in terms of performance but also in the manner they encode information. Finally, we observe that the choice of the size and the embedding layer in PLMs are essential model hyperparameters in different tasks. Overall, our analyzes reveal the promising potential of PLMs in modeling protein function while providing insights into their information-handling capabilities. We also discuss the crucial factors that should be taken into account when selecting a PLM tailored to a particular task.

Publisher

Cold Spring Harbor Laboratory

Reference64 articles.

1. Jacqueline Parkin and Bryony Cohen . “An overview of the immune system”. In: The Lancet 357.9270 (2001), pp. 1777–1789.

2. “How B-cell receptor repertoire sequencing can be enriched with structural antibody data”;Frontiers in immunology,2017

3. Rahmad Akbar , et al. “A compact vocabulary of paratope-epitope interactions enables predictability of antibody-antigen binding”. In: Cell Reports 34.11 (2021), p. 108856.

4. “Estimating the diversity, completeness, and cross-reactivity of the T cell repertoire”;Frontiers in immunology,2013

5. Yuval Elhanati , et al. “Inferring processes underlying B-cell repertoire diversity”. In: Philosophical Transactions of the Royal Society B: Biological Sciences 370.1676 (2015), p. 20140243.

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Self-supervised learning of T cell receptor sequences exposes core properties for T cell membership;Science Advances;2024-04-26