Deciphering the language of antibodies using self-supervised learning-Reference-Cited by-同舟云学术

Deciphering the language of antibodies using self-supervised learning

Published:2021-11-11 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Leem Jinwoo^ORCID,Mitchell Laura S.^ORCID,Farmery James H.R.^ORCID,Barton Justin^ORCID,Galson Jacob D.^ORCID

Abstract

AbstractAn individual’s B cell receptor (BCR) repertoire encodes information about past immune responses, and potential for future disease protection. Deciphering the information stored in BCR sequence datasets will transform our fundamental understanding of disease and enable discovery of novel diagnostics and antibody therapeutics. One of the grand challenges of BCR sequence analysis is the prediction of BCR properties from their amino acid sequence alone. Here we present an antibody-specific language model, AntiBERTa, which provides a contextualised representation of BCR sequences. Following pre-training, we show that AntiBERTa embeddings learn biologically relevant information, generalizable to a range of applications. As a case study, we demonstrate how AntiBERTa can be fine-tuned to predict paratope positions from an antibody sequence, outperforming public tools across multiple metrics. To our knowledge, AntiBERTa is the deepest protein family-specific language model, providing a rich representation of BCRs. AntiBERTa embeddings are primed for multiple downstream tasks and can improve our understanding of the language of antibodies.

Publisher

Cold Spring Harbor Laboratory

Reference53 articles.

1. The promise and challenge of high-throughput sequencing of the antibody repertoire

2. Understanding the human antibody repertoire

3. Timely and spatially regulated maturation of B and T cell repertoire during human fetal development

4. A pathogenic and clonally expanded B cell transcriptome in active multiple sclerosis

5. Analysis of the B cell receptor repertoire in six immune-mediated diseases

Cited by 7 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. On Pre-trained Language Models for Antibody;2023-01-31

2. AntiRef: reference clusters of human antibody sequences;2022-12-31

3. AbBERT: Learning Antibody Humanness via Masked Language Modeling;2022-08-04

4. Machine Learning Approaches to TCR Repertoire Analysis;Frontiers in Immunology;2022-07-15

5. Machine-designed biotherapeutics: opportunities, feasibility and advantages of deep learning in computational antibody discovery;Briefings in Bioinformatics;2022-07-14