AN ASSESSMENT OF A METRIC SPACE DATABASE INDEX TO SUPPORT SEQUENCE HOMOLOGY-Reference-Cited by-同舟云学术

AN ASSESSMENT OF A METRIC SPACE DATABASE INDEX TO SUPPORT SEQUENCE HOMOLOGY

Published:2005-10 Issue:05 Volume:14 Page:867-885
ISSN:0218-2130
Container-title:International Journal on Artificial Intelligence Tools
language:en
Short-container-title:Int. J. Artif. Intell. Tools

Author:

MAO RUI¹,XU WEIJIA¹,SINGH NEHA¹,MIRANKER DANIEL P.¹

Affiliation:

1. Department of Computer Sciences, The University of Texas at Austin, 1 University Station C0500, Austin, TX 78712-0233, USA

Abstract

Hierarchical metric-space clustering methods have been commonly used to organize proteomes into taxonomies. Consequently, it is often anticipated that hierarchical clustering can be leveraged as a basis for scalable database index structures capable of managing the hyper-exponential growth of sequence data. M-tree is one such data structure specialized for the management of large data sets on disk. We explore the application of M-trees to the storage and retrieval of peptide sequence data. Exploiting a technique first suggested by Myers, we organize the database as records of fixed length substrings. Empirical results are promising. However, metric-space indexes are subject to "the curse of dimensionality" and the ultimate performance of an index is sensitive to the quality of the initial construction of the index. We introduce new hierarchical bulk-load algorithm that alternates between top-down and bottom-up clustering to initialize the index. Using the Yeast Proteomes, the bi-directional bulk load produces a more effective index than the existing M-tree initialization algorithms.

Publisher

World Scientific Pub Co Pte Lt

Subject

Artificial Intelligence,Artificial Intelligence

Link

https://www.worldscientific.com/doi/pdf/10.1142/S0218213005002430

Reference14 articles.

1. A new method for analyzing protein sequence relationships based on Sammon maps

2. SST: an algorithm for finding near-exact sequence matches in time proportional to the logarithm of the database size

Cited by 6 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. RPA: a memory-efficient metric-space recall@<italic>R</italic> ANNS index;Journal of Shenzhen University Science and Engineering;2023-11-01

2. Intelligent Indexing—Boosting Performance in Database Applications by Recognizing Index Patterns;Electronics;2020-08-20

3. Pivot selection for metric-space indexing;International Journal of Machine Learning and Cybernetics;2016-02-03

4. Pivot selection: Dimension reduction for distance-based indexing;Journal of Discrete Algorithms;2012-05

5. TESTING EMBEDDABILITY BETWEEN METRIC SPACES;International Journal of Foundations of Computer Science;2009-04