Language models for the prediction of SARS-CoV-2 inhibitors-Reference-Cited by-同舟云学术

Language models for the prediction of SARS-CoV-2 inhibitors

Published:2022-10-07 Issue:5-6 Volume:36 Page:587-602
ISSN:1094-3420
Container-title:The International Journal of High Performance Computing Applications
language:en
Short-container-title:The International Journal of High Performance Computing Applications

Author:

Blanchard Andrew E¹^ORCID,Gounley John¹,Bhowmik Debsindhu¹,Chandra Shekar Mayanka¹,Lyngaas Isaac¹,Gao Shang¹,Yin Junqi¹,Tsaris Aristeidis¹,Wang Feiyi¹,Glaser Jens¹^ORCID

Affiliation:

1. Oak Ridge National Laboratory, Oak Ridge, TN, USA

Abstract

The COVID-19 pandemic highlights the need for computational tools to automate and accelerate drug design for novel protein targets. We leverage deep learning language models to generate and score drug candidates based on predicted protein binding affinity. We pre-trained a deep learning language model (BERT) on ∼9.6 billion molecules and achieved peak performance of 603 petaflops in mixed precision. Our work reduces pre-training time from days to hours, compared to previous efforts with this architecture, while also increasing the dataset size by nearly an order of magnitude. For scoring, we fine-tuned the language model using an assembled set of thousands of protein targets with binding affinity data and searched for inhibitors of specific protein targets, SARS-CoV-2 Mpro and PLpro. We utilized a genetic algorithm approach for finding optimal candidates using the generation and scoring capabilities of the language model. Our generalizable models accelerate the identification of inhibitors for emerging therapeutic targets.

Funder

U.S. Department of Energy

Publisher

SAGE Publications

Subject

Hardware and Architecture,Theoretical Computer Science,Software

Link

http://journals.sagepub.com/doi/pdf/10.1177/10943420221121804

Reference74 articles.

1. GlaserJ (2021) Binding Affinity Training Data Set URL https://huggingface.co/datasets/jglaser/binding_affinity

2. Supercomputer-Based Ensemble Docking Drug Discovery Pipeline with Application to Covid-19

3. Achdout H, Aimon A, Bar-David E et al. (2020) COVID moonshot: open science discovery of SARS-CoV-2 main protease inhibitors by combining crowdsourcing, high-throughput experiments, computational simulations, and machine learning. bioRxiv.

4. High Performance I/O For Large Scale Deep Learning

5. Randomized SMILES strings improve the quality of molecular generative models

Cited by 15 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Enhancing molecular design efficiency: Uniting language models and generative networks with genetic algorithms;Patterns;2024-04

2. Drugging the entire human proteome: Are we there yet?;Drug Discovery Today;2024-03

3. PLAPT: Protein-Ligand Binding Affinity Prediction Using Pretrained Transformers;2024-02-09

4. ChatGPT in the Material Design: Selected Case Studies to Assess the Potential of ChatGPT;Journal of Chemical Information and Modeling;2024-01-18

5. Automation and machine learning augmented by large language models in a catalysis study;Chemical Science;2024