Author:
Kim Sungchul,Sael Lee,Yu Hwanjo
Abstract
Abstract
Understanding functions of proteins is one of the most important challenges in many studies of biological processes. The function of a protein can be predicted by analyzing the functions of structurally similar proteins, thus finding structurally similar proteins accurately and efficiently from a large set of proteins is crucial. A protein structure can be represented as a vector by 3D-Zernike Descriptor (3DZD) which compactly represents the surface shape of the protein tertiary structure. This simplified representation accelerates the searching process. However, computing the similarity of two protein structures is still computationally expensive, thus it is hard to efficiently process many simultaneous requests of structurally similar protein search. This paper proposes indexing techniques which substantially reduce the search time to find structurally similar proteins. In particular, we first exploit two indexing techniques, i.e., iDistance and iKernel, on the 3DZDs. After that, we extend the techniques to further improve the search speed for protein structures. The extended indexing techniques build and utilize an reduced index constructed from the first few attributes of 3DZDs of protein structures. To retrieve top-k similar structures, top-10 × k similar structures are first found using the reduced index, and top-k structures are selected among them. We also modify the indexing techniques to support θ-based nearest neighbor search, which returns data points less than θ to the query point. The results show that both iDistance and iKernel significantly enhance the searching speed. In top-k nearest neighbor search, the searching time is reduced 69.6%, 77%, 77.4% and 87.9%, respectively using iDistance, iKernel, the extended iDistance, and the extended iKernel. In θ-based nearest neighbor serach, the searching time is reduced 80%, 81%, 95.6% and 95.6% using iDistance, iKernel, the extended iDistance, and the extended iKernel, respectively.
Publisher
Springer Science and Business Media LLC
Subject
Health Informatics,Health Policy,Computer Science Applications
Reference32 articles.
1. Sael L, Li B, La D, Fang Y, Ramani K, Rustamov R, Kihara D: Fast protein tertiary structure retrieval based on global surface shape similarity. Proteins. 2008, 72: 1259-1273. 10.1002/prot.22030.
2. Deng K, Zhou X, Shen HT, Liu Q, Xu K, Lin X: A multi-resolution surface distance model for k-NN query processing. VLDB. 2008, 1101-1119.
3. Shen HT, Huang Z, Cao J, Zhou X: High-dimensional indexing with oriented cluster representation for multimedia databases. VLDB. 2009, ICME, 1628-1631.
4. Kim S, Lee S, Yu H: Indexing methods for efficient protein 3D surface search. Proceedings of the ACM Sixth International Workshop on Data and Text Mining in Biomedical Informatics. 2012, New York: ACM, 41-48. 10.1145/2390068.2390078.
5. Gibrat JF, Madej T, Bryant SH: Surprising similarities in structure comparison. Curr Opin Struct Biol. 1996, 6: 377-385. 10.1016/S0959-440X(96)80058-3.
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献