MaskMol: Knowledge-guided Molecular Image Pre-Training Framework for Activity Cliffs with Pixel Masking-Reference-Cited by-同舟云学术

MaskMol: Knowledge-guided Molecular Image Pre-Training Framework for Activity Cliffs with Pixel Masking

Published:2024-09-09 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Cheng Zhixiang,Xiang Hongxin,Ma Pengsen,Zeng Li,Jin Xin,Yang Xixi,Lin Jianxin,Deng Yang,Song Bosheng,Feng Xinxin,Deng Changhui,Zeng Xiangxiang

Abstract

AbstractActivity cliffs, which refer to pairs of molecules that are structurally similar but show significant differences in their potency, can lead to model representation collapse and make the model challenging to distinguish them. Our research indicates that as molecular similarity increases, graph-based methods struggle to capture these nuances, whereas imagebased approaches effectively retain the distinctions. Thus, we developed MaskMol, a knowledge-guided molecular image selfsupervised learning framework. MaskMol accurately learns the representation of molecular images by considering multiple levels of molecular knowledge, such as atoms, bonds, and substructures. By utilizing pixel masking tasks, MaskMol extracts fine-grained information from molecular images, overcoming the limitations of existing deep learning models in identifying subtle structural changes. Experimental results demonstrate MaskMol’s high accuracy and transferability in activity cliff estimation and compound potency prediction across 20 different macromolecular targets, outperforming 25 state-of-the-art deep learning and machine learning approaches. Visualization analyses reveal MaskMol’s high biological interpretability in identifying activity cliff-relevant molecular substructures. Notably, through MaskMol, we identified candidate EP4 inhibitors that could be used to treat tumors. This study not only raises awareness about activity cliffs but also introduces a novel method for molecular image representation learning and virtual screening, advancing drug discovery and providing new insights into structure-activity relationships (SAR). Code is available athttps://github.com/ZhixiangCheng/MaskMol.

Publisher

Cold Spring Harbor Laboratory

Reference89 articles.

1. How artificial intelligence is changing drug discovery

2. X. Zeng , F. Wang , Y. Luo , S.-g. Kang , J. Tang , F. C. Lightstone , E. F. Fang , W. Cornell , R. Nussinov , and F. Cheng , “Deep generative molecular design reshapes drug discovery,” Cell Reports Medicine, 2022.

3. J.-P. Vert , “How will generative ai disrupt data science in drug discovery?” Nature Biotechnology, pp. 1–2, 2023.

4. Macrocyclization of linear molecules by deep learning to facilitate macrocyclic drug candidates discovery;Nature Communications,2023

5. Language models can learn complex molecular distributions;Nature Communications,2022