For antibody sequence generative modeling, mixture models may be all you need-Reference-Cited by-同舟云学术

For antibody sequence generative modeling, mixture models may be all you need

Published:2024-01-30 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Parkinson Jonathan^ORCID,Wang Wei

Abstract

ABSTRACTAntibody therapeutic candidates must exhibit not only tight binding to their target but also good developability properties, especially low risk of immunogenicity. In this work, we fit a simple generative model, SAM, to sixty million human heavy and seventy million human light chains. We show that the probability of a sequence calculated by the model distinguishes human sequences from other species with the same or better accuracy on a variety of benchmark datasets containing >400 million sequences than any other model in the literature, outperforming large language models (LLMs) by large margins. SAM can humanize sequences, generate new sequences, and score sequences for humanness. It is both fast and fully interpretable. Our results highlight the importance of using simple models as baselines for protein engineering tasks. We additionally introduce a new tool for numbering antibody sequences which is orders of magnitude faster than existing tools in the literature. Both these tools are available athttps://github.com/Wang-lab-UCSD/AntPack.

Publisher

Cold Spring Harbor Laboratory

Reference28 articles.

1. Engineering antibody therapeutics;Curr. Opin. Struct. Biol,2016

2. Predicting Antibody Developability Profiles Through Early Stage Discovery Screening;mAbs,2020

3. The immunogenicity of humanized and fully human antibodies

4. Antibody humanization methods – a review and update

5. Humanization of antibodies using a machine learning approach on large-scale repertoire data

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Seq2scFv: a toolkit for the comprehensive analysis of display libraries from long-read sequencing platforms;2024-07-06