Author:
Csaba Gergely,Birzele Fabian,Zimmer Ralf
Abstract
Abstract
Background
SCOP and CATH are widely used as gold standards to benchmark novel protein structure comparison methods as well as to train machine learning approaches for protein structure classification and prediction. The two hierarchies result from different protocols which may result in differing classifications of the same protein. Ignoring such differences leads to problems when being used to train or benchmark automatic structure classification methods. Here, we propose a method to compare SCOP and CATH in detail and discuss possible applications of this analysis.
Results
We create a new mapping between SCOP and CATH and define a consistent benchmark set which is shown to largely reduce errors made by structure comparison methods such as TM-Align and has useful further applications, e.g. for machine learning methods being trained for protein structure classification. Additionally, we extract additional connections in the topology of the protein fold space from the orthogonal features contained in SCOP and CATH.
Conclusion
Via an all-to-all comparison, we find that there are large and unexpected differences between SCOP and CATH w.r.t. their domain definitions as well as their hierarchic partitioning of the fold space on every level of the two classifications. A consistent mapping of SCOP and CATH can be exploited for automated structure comparison and classification.
Availability
Benchmark sets and an interactive SCOP-CATH browser are available at http://www.bio.ifi.lmu.de/SCOPCath.
Publisher
Springer Science and Business Media LLC
Reference25 articles.
1. Berman H, Westbrook J, Feng Z, Gilliland G, Bhat T, Weissig H, Shindyalov I, Bourne P: The Protein Data Bank. Nucleic Acids Research 2000, 28: 235–242. 10.1093/nar/28.1.235
2. Andreeva A, Howorth D, Chandonia J, Brenner S, Hubbard T, Chothia C, Murzin A: Data growth and its impact on the SCOP database: new developments. Nucleic Acids Res 2008, 36: D419–425. 10.1093/nar/gkm993
3. Greene LH, Lewis TE, Addou S, Cuff A, Dallman T, Dibley M, Redfern O, Pearl F, Nambudiry R, Reid A, Sillitoe I, Yeats C, Thornton JM, Orengo CA: The CATH domain structure database: new protocols and classification levels give a more comprehensive resource for exploring evolution. Nucleic acids research 2007, (35 Database):D291–7. 10.1093/nar/gkl959
4. Reeves G, Dallman T, Redfern O, Akpor A, Orengo C: Structural diversity of domain superfamilies in the CATH database. J Mol Biol 2006, 360: 725–741. 10.1016/j.jmb.2006.05.035
5. Todd A, Orengo C, Thornton J: Evolution of function in protein superfamilies, from a structural perspective. J Mol Biol 2001, 307: 1113–1143. 10.1006/jmbi.2001.4513
Cited by
59 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献