Author:
Zhang Odin,Jin Jieyu,Lin Haitao,Zhang Jintu,Hua Chenqing,Huang Yufei,Zhao Huifeng,Hsieh Chang-Yu,Hou Tingjun
Abstract
AbstractAI-aided drug design has facilitated structure-based molecule generation strategies. However, despite significant success, the restriction of the scarcity of protein-ligand data prevents the models from fully exploiting the learning chemical space and discovering unexplored potential drugs. The limited chemical space sampling contrasts with the original intention of generation models to explore a broader chemical space, leading to what we term the Chemical Space Generation Paradox. To address the proposed paradox, we developed ECloudGen with the following attributes: (1) Fundamental Physical Representation: We introduce the electron cloud representation, unifying all biological forces under one representation, offering a compact and continuous learning space. (2) Broad and Structurally Ordered Chemical Space: Utilizing electron clouds as generative agents, ECloudGen leverages data without binding structure to access a broader chemical space. In implementation, ECloudDiff as a latent ECloud-based diffusion model is established to sample high-fidelity electron clouds conditioned on pockets’s structure; and CEMP as a novel contrastive learning strategy is proposed to structurally organize the chemical space, thus enabling controllable generation. Subsequent experiments confirm ECloud-Gen’s state-of-the-art performance, in generating chemically feasible molecules with high binding efficacy, drug-likeness, and other chemical properties. Besides, ECloudGen proves to encompass a broader chemical space and also demonstrates superiority in controllable generation in extensive experiments.
Publisher
Cold Spring Harbor Laboratory
Reference78 articles.
1. Developing therapeutic approaches for twenty-first-century emerging infectious viral diseases
2. Deep learning for molecular generation;Future medicinal chemistry,2019
3. Kenneth M Merz Jr , Dagmar Ringe , and Charles H Reynolds . Drug design: structure-and ligand-based approaches. Cambridge University Press, 2010.
4. Xingang Peng , Shitong Luo , Jiaqi Guan , Qi Xie , Jian Peng , and Jianzhu Ma . Pocket2mol: Efficient molecular sampling based on 3d protein pockets. in International Conference on Machine Learning, pages 17644–17655. PMLR, 2022.
5. Resgen is a pocket-aware 3d molecular generation model based on parallel multiscale modelling;Nature Machine Intelligence,2023