Author:
Oruç Tuğçe,Kadukova Maria,Davies Thomas G.,Verdonk Marcel,Poelking Carl
Abstract
AbstractBinding sites are the key interfaces that determine a protein’s biological activity, and therefore common targets for therapeutic intervention. Techniques that help us detect, compare and contextualise binding sites are hence of immense interest to drug discovery. Here we present an approach that integrates protein language models with a 3D tesselation technique to derive rich and versatile representations of binding sites that combine functional, structural and evolutionary information with unprecedented detail. We demonstrate that the associated similarity metrics induce meaningful pocket clusterings by balancing local structure against global sequence effects. The resulting embeddings are shown to simplify a variety of downstream tasks: they help organise the “pocketome” in a way that efficiently contextualises new binding sites, construct performant druggability models, and define challenging train-test splits for believable benchmarking of pocket-centric machine-learning models.
Publisher
Cold Spring Harbor Laboratory