End-to-end sequence-structure-function meta-learning predicts genome-wide chemical-protein interactions for dark proteins-Reference-Cited by-同舟云学术

End-to-end sequence-structure-function meta-learning predicts genome-wide chemical-protein interactions for dark proteins

Published:2023-01-18 Issue:1 Volume:19 Page:e1010851
ISSN:1553-7358
Container-title:PLOS Computational Biology
language:en
Short-container-title:PLoS Comput Biol

Author:

Cai Tian,Xie Li,Zhang Shuo^ORCID,Chen Muge,He Di,Badkul Amitesh,Liu Yang,Namballa Hari Krishna,Dorogan Michael,Harding Wayne W.,Mura Cameron^ORCID,Bourne Philip E.,Xie Lei^ORCID

Abstract

Systematically discovering protein-ligand interactions across the entire human and pathogen genomes is critical in chemical genomics, protein function prediction, drug discovery, and many other areas. However, more than 90% of gene families remain “dark”—i.e., their small-molecule ligands are undiscovered due to experimental limitations or human/historical biases. Existing computational approaches typically fail when the dark protein differs from those with known ligands. To address this challenge, we have developed a deep learning framework, called PortalCG, which consists of four novel components: (i) a 3-dimensional ligand binding site enhanced sequence pre-training strategy to encode the evolutionary links between ligand-binding sites across gene families; (ii) an end-to-end pretraining-fine-tuning strategy to reduce the impact of inaccuracy of predicted structures on function predictions by recognizing the sequence-structure-function paradigm; (iii) a new out-of-cluster meta-learning algorithm that extracts and accumulates information learned from predicting ligands of distinct gene families (meta-data) and applies the meta-data to a dark gene family; and (iv) a stress model selection step, using different gene families in the test data from those in the training and development data sets to facilitate model deployment in a real-world scenario. In extensive and rigorous benchmark experiments, PortalCG considerably outperformed state-of-the-art techniques of machine learning and protein-ligand docking when applied to dark gene families, and demonstrated its generalization power for target identifications and compound screenings under out-of-distribution (OOD) scenarios. Furthermore, in an external validation for the multi-target compound screening, the performance of PortalCG surpassed the rational design from medicinal chemists. Our results also suggest that a differentiable sequence-structure-function deep learning framework, where protein structural information serves as an intermediate layer, could be superior to conventional methodology where predicted protein structures were used for the compound screening. We applied PortalCG to two case studies to exemplify its potential in drug discovery: designing selective dual-antagonists of dopamine receptors for the treatment of opioid use disorder (OUD), and illuminating the understudied human genome for target diseases that do not yet have effective and safe therapeutics. Our results suggested that PortalCG is a viable solution to the OOD problem in exploring understudied regions of protein functional space.

Funder

National Institute of General Medical Sciences

National Institute on Aging

National Science Foundation

Publisher

Public Library of Science (PLoS)

Subject

Computational Theory and Mathematics,Cellular and Molecular Neuroscience,Genetics,Molecular Biology,Ecology,Modeling and Simulation,Ecology, Evolution, Behavior and Systematics

Reference62 articles.

1. MSA-Regularized Protein Sequence Transformer toward Predicting Genome-Wide Chemical-Protein Interactions: Application to GPCRome Deorphanization;T Cai;Journal of Chemical Information and Modeling,2021

2. Few-shot learning creates predictive models of drug response that translate from high-throughput screens to individual patients;J Ma;Nature Cancer,2021

3. A context-aware deconfounding autoencoder for robust prediction of personalized clinical drug response from cell-line compound screening;D He;Nature Machine Intelligence,2022

4. Improved protein structure refinement guided by deep learning based accuracy estimation;N Hiranuma;Nature communications,2021

5. Highly accurate protein structure prediction with AlphaFold;J Jumper;Nature,2021

Cited by 7 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Semi-supervised meta-learning elucidates understudied molecular interactions;Communications Biology;2024-09-09

2. Natural Product-Inspired Dopamine Receptor Ligands;Journal of Medicinal Chemistry;2024-07-22

3. A bidirectional interpretable compound-protein interaction prediction framework based on cross attention;Computers in Biology and Medicine;2024-04

4. Trust, Ethics, and User-Centric Design in AI-Integrated Genomics;2024 2nd International Conference on Cyber Resilience (ICCR);2024-02-26

5. TrustAffinity: accurate, reliable and scalable out-of-distribution protein-ligand binding affinity prediction using trustworthy deep learning;2024-01-08