Author:
Greener Joe G,Jamali Kiarash
Abstract
AbstractComparing and searching protein structures independent of primary sequence has proved useful for remote homology detection, function annotation and protein classification. With the recent leap in accuracy of protein structure prediction methods and increased availability of protein models, attention is turning to how to best make use of this data. Fast and accurate methods to search databases of millions of structures will be essential to this endeavour, in the same way that fast protein sequence searching underpins much of bioinformatics. We train a simple graph neural network using supervised contrastive learning to learn a low-dimensional embedding of protein structure. The embedding can be used to search structures against large structural databases with accuracy comparable to current methods. The speed of the method and ability to scale to millions of structures makes it suitable for this structure-rich era. The method, called Progres, is available athttps://github.com/jgreener64/progres.
Publisher
Cold Spring Harbor Laboratory