PrePCI: A structure‐ and chemical similarity‐informed database of predicted protein compound interactions

Author:

Trudeau Stephen J.12,Hwang Howook13,Mathur Deepika145,Begum Kamrun1,Petrey Donald1,Murray Diana1,Honig Barry1678ORCID

Affiliation:

1. Department of Systems Biology Columbia University Irving Medical Center New York New York USA

2. Integrated Graduate Program in Cellular, Molecular and Biomedical Studies (CMBS), Columbia University Irving Medical Center New York New York USA

3. Schrodinger, Inc. New York New York USA

4. Department of Genetics and Genomic Sciences Icahn School of Medicine at Mount Sinai New York New York USA

5. Department of Psychiatry Icahn School of Medicine at Mount Sinai New York New York USA

6. Department of Biochemistry and Molecular Biophysics Columbia University Irving Medical Center New York New York USA

7. Department of Medicine Columbia University New York New York USA

8. Zuckerman Mind Brain and Behavior Institute Columbia University New York New York USA

Abstract

AbstractWe describe the Predicting Protein–Compound Interactions (PrePCI) database which comprises over 5 billion predicted interactions between 6.8 million chemical compounds and 19,797 human proteins. PrePCI relies on a proteome‐wide database of structural models based on both traditional modeling techniques and the AlphaFold Protein Structure Database. Sequence‐ and structural similarity‐based metrics are established between template proteins, T, in the Protein Data Bank that bind compounds, C, and query proteins in the model database, Q. When the metrics exceed threshold values, it is assumed that C also binds to Q with a likelihood ratio (LR) derived from machine learning. If the relationship is based on structural similarity, the LR is based on a scoring function that measures the extent to which C is compatible with the binding site of Q as described in the LT‐scanner algorithm. For every predicted complex derived in this way, chemical similarity based on the Tanimoto coefficient identifies other small molecules that may bind to Q. An overall LR for the binding of C to Q is obtained from Naive Bayesian statistics. The PrePCI database can be queried by entering a UniProt ID or gene name for a protein to obtain a list of compounds predicted to bind to it along with associated LRs. Alternatively, entering an identifier for the compound outputs a list of proteins it is predicted to bind. Specific applications of the database to lead discovery, elucidation of drug mechanism of action, and biological function annotation are described.

Publisher

Wiley

Subject

Molecular Biology,Biochemistry

Reference66 articles.

1. Advancing the activity cliff concept;Bajorath J;F1000Res,2013

2. Why is Tanimoto index an appropriate choice for fingerprint-based similarity calculations?

3. A machine learning approach to predicting protein–ligand binding affinity with applications to molecular docking

4. UniProt: the universal protein knowledgebase in 2023;Bateman A;Nucleic Acids Res,2022

5. The Protein Data Bank

Cited by 2 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3