An effective self-supervised framework for learning expressive molecular global representations to drug discovery

Author:

Li Pengyong1,Wang Jun2,Qiao Yixuan3,Chen Hao4,Yu Yihuan5,Yao Xiaojun6,Gao Peng2,Xie Guotong2,Song Sen7

Affiliation:

1. Department of Biomedical Engineering at Tsinghua University, China

2. Ping An Healthcare Technology, Chaoyang, 100027 Beijing, China

3. Operations Research and Cybernetics at Beijing University of Technology, China

4. Cybernetics at Beijing University of Technology, China

5. Beijing University of Biomedical Engineering, China

6. Analytical Chemistry and Chemoinformatics at Lanzhou University, China

7. Tsinghua Laboratory of Brain and Intelligence and Department of Biomedical Engineering, Tsinghua University, Haidian, 100084 Beijing, China

Abstract

Abstract How to produce expressive molecular representations is a fundamental challenge in artificial intelligence-driven drug discovery. Graph neural network (GNN) has emerged as a powerful technique for modeling molecular data. However, previous supervised approaches usually suffer from the scarcity of labeled data and poor generalization capability. Here, we propose a novel molecular pre-training graph-based deep learning framework, named MPG, that learns molecular representations from large-scale unlabeled molecules. In MPG, we proposed a powerful GNN for modelling molecular graph named MolGNet, and designed an effective self-supervised strategy for pre-training the model at both the node and graph-level. After pre-training on 11 million unlabeled molecules, we revealed that MolGNet can capture valuable chemical insights to produce interpretable representation. The pre-trained MolGNet can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of drug discovery tasks, including molecular properties prediction, drug-drug interaction and drug-target interaction, on 14 benchmark datasets. The pre-trained MolGNet in MPG has the potential to become an advanced molecular encoder in the drug discovery pipeline.

Funder

Department of Education Key Innovation Research

Institute Guoqiang at Tsinghua University

National Natural Science Foundation of China

Beijing Brain Science Special Project

Publisher

Oxford University Press (OUP)

Subject

Molecular Biology,Information Systems

Reference69 articles.

1. Bert: Pre-training of deep bidirectional transformers for language understanding;Devlin,2018

2. Deep learning in drug target interaction prediction: Current and future perspective;Abbasi;Curr Med Chem,2020

3. The properties of known drugs;Bemis;1. molecular frameworks. J Med Chem,1996

4. Language models are few-shot learners;Brown,2020

5. Advancing drug discovery via artificial intelligence;Chan,2019

Cited by 74 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3