PLINDER: The protein-ligand interactions dataset and evaluation resource-Reference-Cited by-同舟云学术

PLINDER: The protein-ligand interactions dataset and evaluation resource

Published:2024-07-17 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Durairaj Janani^ORCID,Adeshina Yusuf^ORCID,Cao Zhonglin^ORCID,Zhang Xuejin,Oleinikovas Vladas^ORCID,Duignan Thomas^ORCID,McClure Zachary^ORCID,Robin Xavier^ORCID,Studer Gabriel^ORCID,Kovtun Daniel^ORCID,Rossi Emanuele^ORCID,Zhou Guoqing^ORCID,Veccham Srimukh^ORCID,Isert Clemens^ORCID,Peng Yuxing,Sundareson Prabindh^ORCID,Akdel Mehmet^ORCID,Corso Gabriele^ORCID,Stärk Hannes^ORCID,Tauriello Gerardo^ORCID,Carpenter Zachary,Bronstein Michael^ORCID,Kucukbenli Emine^ORCID,Schwede Torsten^ORCID,Naef Luca^ORCID

Abstract

AbstractProtein-ligand interactions (PLI) are foundational to small molecule drug design. With computational methods striving towards experimental accuracy, there is a critical demand for a well-curated and diverse PLI dataset. Existing datasets are often limited in size and diversity, and commonly used evaluation sets suffer from training information leakage, hindering the realistic assessment of method generalization capabilities. To address these shortcomings, we present PLIN-DER, the largest and most annotated dataset to date, comprising 449,383 PLI systems, each with over 500 annotations, similarity metrics at protein, pocket, interaction and ligand levels, and paired unbound (apo) and predicted structures. We propose an approach to generate training and evaluation splits that minimizes task-specific leakage and maximizes test set quality, and compare the resulting performance of DiffDock when retrained with different kinds of splits.

Publisher

Cold Spring Harbor Laboratory

Reference51 articles.

1. Argo Workflow (v3.5.8). https://github.com/argoproj.

2. NVIDIA BioNeMo (v1.4). https://www.nvidia.com/en-us/clara/bionemo.

3. Kubernetes (v1.30). https://kubernetes.io/.

4. Metaflow (v2.11.15). https://docs.metaflow.org/.

5. Rdkit: Open-source cheminformatics. https://www.rdkit.org. Accessed: 2024-05-17.

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Comparative evaluation of methods for the prediction of protein-ligand binding sites;2024-08-08