Affiliation:
1. DEIS---CSITE--CNR, University of Bologna, Bologna, Italy
Abstract
Novel database applications, such as multimedia, data mining, e-commerce, and many others, make intensive use of similarity queries in order to retrieve the objects that better fit a user request. Since the effectiveness of such queries improves when the user is allowed to personalize the similarity criterion according to which database objects are evaluated and ranked, the development of access methods able to efficiently support user-defined similarity queries becomes a basic requirement. In this article we introduce the first index structure, called the QIC-M-tree, that can process user-defined queries in generic metric spaces, that is, where the only information about indexed objects is their relative distances. The QIC-M-tree is a metric access method that can deal with several distinct distances at a time: (1) a
query (user-defined) distance
, (2) an
index distance
(used to build the tree), and (3) a
comparison (approximate) distance
(used to quickly discard from the search uninteresting parts of the tree). We develop an analytical cost model that accurately characterizes the performance of the QIC-M-tree and validate such model through extensive experimentation on real metric data sets. In particular, our analysis is able to predict the best evaluation strategy (i.e., which distances to use) under a variety of configurations, by properly taking into account relevant factors such as the distribution of distances, the cost of computing distances, and the actual index structure. We also prove that the overall saving in CPU search costs when using an approximate distance can be estimated by using information on the data set only (thus such measure is independent of the underlying access method) and show that performance results are closely related to a novel "indexing" error measure.
Publisher
Association for Computing Machinery (ACM)
Cited by
56 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献