Affiliation:
1. Digital Medical Research Center, School of Basic Medical Sciences, Fudan University , Shanghai 200032, China
2. Shanghai Key Laboratory of Medical Image Computing and Computer Assisted Intervention, Fudan University , Shanghai 200032, China
Abstract
Abstract
Summary
The biological functions of proteins are determined by the chemical and geometric properties of their surfaces. Recently, with the booming progress of deep learning, a series of learning-based surface descriptors have been proposed and achieved inspirational performance in many tasks such as protein design, protein–protein interaction prediction, etc. However, they are still limited by the problem of label scarcity, since the labels are typically obtained through wet experiments. Inspired by the great success of self-supervised learning in natural language processing and computer vision, we introduce ProteinMAE, a self-supervised framework specifically designed for protein surface representation to mitigate label scarcity. Specifically, we propose an efficient network and utilize a large number of accessible unlabeled protein data to pretrain it by self-supervised learning. Then we use the pretrained weights as initialization and fine-tune the network on downstream tasks. To demonstrate the effectiveness of our method, we conduct experiments on three different downstream tasks including binding site identification in protein surface, ligand-binding protein pocket classification, and protein–protein interaction prediction. The extensive experiments show that our method not only successfully improves the network’s performance on all downstream tasks, but also achieves competitive performance with state-of-the-art methods. Moreover, our proposed network also exhibits significant advantages in terms of computational cost, which only requires less than a tenth of memory cost of previous methods.
Availability and implementation
https://github.com/phdymz/ProteinMAE.
Funder
Technology Innovation Plan Of Shanghai Science and Technology Commission
Publisher
Oxford University Press (OUP)
Subject
Computational Mathematics,Computational Theory and Mathematics,Computer Science Applications,Molecular Biology,Biochemistry,Statistics and Probability
Reference37 articles.
1. The Protein Data Bank;Berman;Nucleic Acids Res,2000
2. Deep learning in bioinformatics and biomedicine;Berrar,2021
3. A generalization of algebraic surface drawing;Blinn;ACM Trans Graph,1982
4. Efficient curvature estimation for oriented point clouds;Cao;stat,2019
Cited by
3 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献