Real-time structural motif searching in proteins using an inverted index strategy-Reference-Cited by-同舟云学术

Real-time structural motif searching in proteins using an inverted index strategy

Published:2020-12-07 Issue:12 Volume:16 Page:e1008502
ISSN:1553-7358
Container-title:PLOS Computational Biology
language:en
Short-container-title:PLoS Comput Biol

Author:

Bittrich Sebastian^ORCID,Burley Stephen K.,Rose Alexander S.^ORCID

Abstract

Biochemical and biological functions of proteins are the product of both the overall fold of the polypeptide chain, and, typically, structural motifs made up of smaller numbers of amino acids constituting a catalytic center or a binding site that may be remote from one another in amino acid sequence. Detection of such structural motifs can provide valuable insights into the function(s) of previously uncharacterized proteins. Technically, this remains an extremely challenging problem because of the size of the Protein Data Bank (PDB) archive. Existing methods depend on a clustering by sequence similarity and can be computationally slow. We have developed a new approach that uses an inverted index strategy capable of analyzing >170,000 PDB structures with unmatched speed. The efficiency of the inverted index method depends critically on identifying the small number of structures containing the query motif and ignoring most of the structures that are irrelevant. Our approach (implemented at motif.rcsb.org) enables real-time retrieval and superposition of structural motifs, either extracted from a reference structure or uploaded by the user. Herein, we describe the method and present five case studies that exemplify its efficacy and speed for analyzing 3D structures of both proteins and nucleic acids.

Funder

National Science Foundation

U.S. Department of Energy

National Institutes of Health

Publisher

Public Library of Science (PLoS)

Subject

Computational Theory and Mathematics,Cellular and Molecular Neuroscience,Genetics,Molecular Biology,Ecology,Modelling and Simulation,Ecology, Evolution, Behavior and Systematics

Reference39 articles.

1. Serine protease mechanism and specificity;L Hedstrom;Chemical reviews,2002

2. Molecular structure of leucine aminopeptidase at 2.7-A resolution;SK Burley;Proceedings of the National Academy of Sciences,1990

3. 3D Motifs

4. Design and selection of novel Cys2His2 zinc finger proteins;CO Pabo;Annual review of biochemistry,2001

Cited by 13 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Non-covalent Lasso Entanglements in Folded Proteins: Prevalence, Functional Implications, and Evolutionary Significance;Journal of Molecular Biology;2024-01

2. pyScoMotif: Discovery of similar 3D structural motifs across proteins;2023-08-28

3. RCSB Protein Data Bank: Efficient Searching and Simultaneous Access to One Million Computed Structure Models Alongside the PDB Structures Enabled by Architectural Advances;Journal of Molecular Biology;2023-07

4. Dual-wield NTPases: a novel protein family mined from AlphaFold DB;2023-02-21

5. pyScoMotif: discovery of similar 3D structural motifs across proteins;Bioinformatics Advances;2023-01-01