Abstract
Protein structure analysis and classification, which is fundamental for predicting protein function, still poses formidable challenges in the fields of molecular biology, mathematics, physics and computer science. In the present work we exploit recent advances in computational topology to define a new intrinsic unsupervised topological fingerprint for proteins. These fingerprints, computed via Local Euler Curvature (LECs), identify secondary protein structures, such as Helices and Sheets, by capturing their distinctive topological signatures. Using an extensive protein residue database, the proposed computational framework not only distinguishes between structural classes via unsupervised clustering but also achieves remarkable accuracy in classifying proteins structures through supervised machine learning classifier. We also show that the internal structure of LEC space embeds the information about the secondary structure of proteins. Beyond its immediate implications for the advancement of critical application areas such as drug design and biotechnology, our approach opens a fascinating avenue towards characterizing the multiscale structures of diverse biopolymers based solely on their geometric and topological attributes.
Publisher
Cold Spring Harbor Laboratory