Abstract
AbstractMotivationStructures are replacing the role of sequences. Traditional bioinformatics research focused on sequences because they were easily obtained. Advances in techniques like cryo-electron microscopy, molecular modeling, docking algorithms, and structure prediction software have shifted the focus to structures. Given the importance of deep learning in many of these breakthroughs, it makes sense to also explore how it can modernize classic bioinformatics tools. However, empirical findings have shown that machine learning based methods have many pitfalls resulting in overoptimistic conclusions, including data leakage between test and training data. Thus, there is a need for new innovations to make neural networks more intelligible.ResultsWe have developed vanGOGH, a geometric deep learning-based structural alignment approach that performs on par with the state-of-the-art without ever having been trained on a pair of naturally found homologs. We adopted a data-centric approach to address deep learning and data limitations by augmenting protein templates into synthetic homologs for training.Our method allowed us to supplement homolog data by knowledge-driven augmentation, self-learning of relevant structural features by supervised examples and protein alignment that is competitive with state-of-the art methods.AvailabilityGNN framework:https://github.com/DeepRank/deeprank-core/tree/main/deeprankcoreContactLi.Xue@radboudumc.nl
Publisher
Cold Spring Harbor Laboratory