Toward Unified AI Drug Discovery with Multimodal Knowledge-Reference-Cited by-同舟云学术

Toward Unified AI Drug Discovery with Multimodal Knowledge

Published:2024-01 Issue: Volume:4 Page:
ISSN:2765-8783
Container-title:Health Data Science
language:en
Short-container-title:Health Data Sci

Author:

Luo Yizhen¹²^ORCID,Liu Xing Yi¹,Yang Kai¹,Huang Kui¹³,Hong Massimo¹²,Zhang Jiahuan¹,Wu Yushuai¹^ORCID,Nie Zaiqing¹⁴^ORCID

Affiliation:

1. Institute for AI Industry Research (AIR), Tsinghua University, Beijing, China.

2. Department of Computer Science and Technology, Tsinghua University, Beijing, China.

3. School of Software and Microelectronics, Peking University, Beijing, China.

4. Beijing Academy of Artificial Intelligence (BAAI), Beijing, China.

Abstract

Background: In real-world drug discovery, human experts typically grasp molecular knowledge of drugs and proteins from multimodal sources including molecular structures, structured knowledge from knowledge bases, and unstructured knowledge from biomedical literature. Existing multimodal approaches in AI drug discovery integrate either structured or unstructured knowledge independently, which compromises the holistic understanding of biomolecules. Besides, they fail to address the missing modality problem, where multimodal information is missing for novel drugs and proteins. Methods: In this work, we present KEDD, a unified, end-to-end deep learning framework that jointly incorporates both structured and unstructured knowledge for vast AI drug discovery tasks. The framework first incorporates independent representation learning models to extract the underlying characteristics from each modality. Then, it applies a feature fusion technique to calculate the prediction results. To mitigate the missing modality problem, we leverage sparse attention and a modality masking technique to reconstruct the missing features based on top relevant molecules. Results: Benefiting from structured and unstructured knowledge, our framework achieves a deeper understanding of biomolecules. KEDD outperforms state-of-the-art models by an average of 5.2% on drug–target interaction prediction, 2.6% on drug property prediction, 1.2% on drug–drug interaction prediction, and 4.1% on protein–protein interaction prediction. Through qualitative analysis, we reveal KEDD’s promising potential in assisting real-world applications. Conclusions: By incorporating biomolecular expertise from multimodal knowledge, KEDD bears promise in accelerating drug discovery.

Funder

National key Research and Development Program of China

Publisher

American Association for the Advancement of Science (AAAS)

Link

https://spj.science.org/doi/pdf/10.34133/hds.0113

Reference77 articles.

1. Drug Discovery: A Historical Perspective

2. Identification of Direct Protein Targets of Small Molecules

3. Drug repurposing: progress, challenges and recommendations

4. Artificial intelligence in drug discovery and development

5. SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules;Weininger D;J Chem Inf Comput Sci,1988

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Large language models for medicine: a survey;International Journal of Machine Learning and Cybernetics;2024-08-19

2. Multi-Modal CLIP-Informed Protein Editing;2024-07-28

3. Unlocking the Future of Drug Development: Generative AI, Digital Twins, and Beyond;BioMedInformatics;2024-06-06