Unsupervised Representation Learning for Proteochemometric Modeling-Reference-Cited by-同舟云学术

Unsupervised Representation Learning for Proteochemometric Modeling

Published:2021-11-28 Issue:23 Volume:22 Page:12882
ISSN:1422-0067
Container-title:International Journal of Molecular Sciences
language:en
Short-container-title:IJMS

Author:

Kim Paul T.^ORCID,Winter Robin^ORCID,Clevert Djork-Arné^ORCID

Abstract

In silico protein–ligand binding prediction is an ongoing area of research in computational chemistry and machine learning based drug discovery, as an accurate predictive model could greatly reduce the time and resources necessary for the detection and prioritization of possible drug candidates. Proteochemometric modeling (PCM) attempts to create an accurate model of the protein–ligand interaction space by combining explicit protein and ligand descriptors. This requires the creation of information-rich, uniform and computer interpretable representations of proteins and ligands. Previous studies in PCM modeling rely on pre-defined, handcrafted feature extraction methods, and many methods use protein descriptors that require alignment or are otherwise specific to a particular group of related proteins. However, recent advances in representation learning have shown that unsupervised machine learning can be used to generate embeddings that outperform complex, human-engineered representations. Several different embedding methods for proteins and molecules have been developed based on various language-modeling methods. Here, we demonstrate the utility of these unsupervised representations and compare three protein embeddings and two compound embeddings in a fair manner. We evaluate performance on various splits of a benchmark dataset, as well as on an internal dataset of protein–ligand binding activities and find that unsupervised-learned representations significantly outperform handcrafted representations.

Funder

European Commission

Publisher

MDPI AG

Subject

Inorganic Chemistry,Organic Chemistry,Physical and Theoretical Chemistry,Computer Science Applications,Spectroscopy,Molecular Biology,General Medicine,Catalysis

Link

https://www.mdpi.com/1422-0067/22/23/12882/pdf

Reference43 articles.

1. Polypharmacology modelling using proteochemometrics (PCM): recent methodological developments, applications to target families, and future prospects

2. Proteochemometric modeling as a tool to design selective compounds and for extrapolating to novel targets

3. Beyond the hype: deep neural networks outperform established methods using a ChEMBL bioactivity benchmark set

4. QSAR Modeling: Where Have You Been? Where Are You Going To?

Cited by 6 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. The Accurate Prediction of Antibody Deamidations by Combining High-Throughput Automated Peptide Mapping and Protein Language Model-Based Deep Learning;Antibodies;2024-09-10

2. HyperPCM: Robust Task-Conditioned Modeling of Drug–Target Interactions;Journal of Chemical Information and Modeling;2024-01-08

3. Using the local symmetry in amino acids sequences of polypeptides to improve the predictive potential of models of their inhibitor activity;Amino Acids;2023-09-14

4. How to approach machine learning-based prediction of drug/compound–target interactions;Journal of Cheminformatics;2023-02-06

5. Beyond sequence: Structure-based machine learning;Computational and Structural Biotechnology Journal;2023