X-MOL: large-scale pre-training for molecular understanding and diverse molecular analysis-Reference-Cited by-同舟云学术

X-MOL: large-scale pre-training for molecular understanding and diverse molecular analysis

Published:2020-12-26 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Xue Dongyu,Zhang Han,Xiao Dongling,Gong Yukang,Chuai Guohui,Sun Yu,Tian Hao,Wu Hua,Li Yukun,Liu Qi

Abstract

AbstractIn silico modelling and analysis of small molecules substantially accelerates the process of drug development. Representing and understanding molecules is the fundamental step for various in silico molecular analysis tasks. Traditionally, these molecular analysis tasks have been investigated individually and separately. In this study, we presented X-MOL, which applies large-scale pre-training technology on 1.1 billion molecules for molecular understanding and representation, and then, carefully designed fine-tuning was performed to accommodate diverse downstream molecular analysis tasks, including molecular property prediction, chemical reaction analysis, drug-drug interaction prediction, de novo generation of molecules and molecule optimization. As a result, X-MOL was proven to achieve state-of-the-art results on all these molecular analysis tasks with good model interpretation ability. Collectively, taking advantage of super large-scale pre-training data and super-computing power, our study practically demonstrated the utility of the idea of “mass makes miracles” in molecular representation learning and downstream in silico molecular analysis, indicating the great potential of using large-scale unlabelled data with carefully designed pre-training and fine-tuning strategies to unify existing molecular analysis tasks and substantially enhance the performance of each task.

Publisher

Cold Spring Harbor Laboratory

Reference44 articles.

1. Devillers, J. & Balaban, A.T. Topological indices and related descriptors in QSAR and QSPAR. (CRC Press, 2000).

2. Karelson, M. Molecular descriptors in QSAR/QSPR, Vol. 230. (Wiley-Interscience New York, 2000).

3. Quantum-Chemical Descriptors in QSAR/QSPR Studies

4. QSAR Modeling: Where Have You Been? Where Are You Going To?

5. Best practices for QSAR model development, validation, and exploitation;Molecular informatics,2010

Cited by 12 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. MolLM: A Unified Language Model to Integrate Biomedical Text with 2D and 3D Molecular Representations;2023-11-25

2. A simple and efficient graph Transformer architecture for molecular properties prediction;Chemical Engineering Science;2023-10

3. Synergistic Fusion of Graph and Transformer Features for Enhanced Molecular Property Prediction;2023-08-31

4. Adaptive language model training for molecular design;Journal of Cheminformatics;2023-06-08

5. SELFormer: molecular representation learning via SELFIES language models;Machine Learning: Science and Technology;2023-06-01