Author:
Wang Hui,Liu Dong,Zhao Kai-Long,Wang Ya-Jun,Zhang Gui-Jun
Abstract
Designing protein with specified structure and function involves a key component named sequence design, which can provide valuable insights into understanding the life systems as well for the diagnosis and therapy of diseases. Although deep learning methods have made great progress in protein sequence design, most of these studies focus on network structure optimization, while ignoring protein-specific physicochemical features. Inspired by the successful application of structure templates and pre-trained models in the field of protein structure prediction, we explored whether the representation of structural sequence profile can be used for protein sequence design. In this work, we proposed SPDesign, a method for protein sequence design based on structural sequence profile using ultrafast shape recognition. Given an input back-bone structure, SPDesign utilizes ultrafast shape recognition vectors to accelerate the search for similar protein structures (aka, structural analogs) in our in-house PAcluster80 structure database, and then extracts the sequence profile from the analogs through structure alignment. Combined with structural pre-trained knowledge and geometric features, they are further feed into an enhanced graph neural network to predict the sequence. Experimental results show that SPDesign significantly outperforms the state-of-the-art methods, such as ProteinMPNN, Pifold and LM-Design, leading to 21.89%, 15.54% and 11.4% accuracy gains in sequence recovery rate on CATH 4.2 benchmark, respectively. Encouraging results also have been achieved on the TS50 and TS500 benchmarks, with performance reaching 68.64% and 71.63%. Furthermore, detailed analysis conducted by the PDBench tool suggest that SPDesign performs well in subdivided structures such as buried residues and solenoid. More interestingly, we found that SPDesign can well reconstruct the sequences of some proteins that have similar structures but different sequences. Finally, the structural modeling verification experiment bears out that the sequences designed by our method can fold into the native structures more accurately.
Publisher
Cold Spring Harbor Laboratory