De novo drug design as GPT language modeling: large chemistry models with supervised and reinforcement learning

Author:

Ye Gavin

Abstract

AbstractIn recent years, generative machine learning algorithms have been successful in designing innovative drug-like molecules. SMILES is a sequence-like language used in most effective drug design models. Due to data’s sequential structure, models such as recurrent neural networks and transformers can design pharmacological compounds with optimized efficacy. Large language models have advanced recently, but their implications on drug design have not yet been explored. Although one study successfully pre-trained a large chemistry model (LCM), its application to specific tasks in drug discovery is unknown. In this study, the drug design task is modeled as a causal language modeling problem. Thus, the procedure of reward modeling, supervised fine-tuning, and proximal policy optimization was used to transfer the LCM to drug design, similar to Open AI’s ChatGPT and InstructGPT procedures. By combining the SMILES sequence with chemical descriptors, the novel efficacy evaluation model exceeded its performance compared to previous studies. After proximal policy optimization, the drug design model generated molecules with 99.2% having efficacy pIC50 > 7 towards the amyloid precursor protein, with 100% of the generated molecules being valid and novel. This demonstrated the applicability of LCMs in drug discovery, with benefits including less data consumption while fine-tuning. The applicability of LCMs to drug discovery opens the door for larger studies involving reinforcement-learning with human feedback, where chemists provide feedback to LCMs and generate higher-quality molecules. LCMs’ ability to design similar molecules from datasets paves the way for more accessible, non-patented alternatives to drug molecules.

Publisher

Springer Science and Business Media LLC

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3