Affiliation:
1. AT&T Bell Laboratories, Murray Hill, NJ
2. Dept. of Computer Science, Univ. of Maryland, College Park
Abstract
A very promising idea for fast searching in traditional and multimedia databases is to map objects into points in
k
-d space, using
k
feature-extraction functions, provided by a domain expert [25]. Thus, we can subsequently use highly fine-tuned spatial access methods (SAMs), to answer several types of queries, including the 'Query By Example' type (which translates to a range query); the 'all pairs' query (which translates to a spatial join [8]); the nearest-neighbor or best-match query, etc.However, designing feature extraction functions can be hard. It is relatively easier for a domain expert to assess the similarity/distance of two objects. Given only the distance information though, it is not obvious how to map objects into points.This is exactly the topic of this paper. We describe a fast algorithm to map objects into points in some
k
-dimensional space (
k
is user-defined), such that the dis-similarities are preserved. There are two benefits from this mapping: (a) efficient retrieval, in conjunction with a SAM, as discussed before and (b) visualization and data-mining: the objects can now be plotted as points in 2-d or 3-d space, revealing potential clusters, correlations among attributes and other regularities that data-mining is looking for.We introduce an older method from pattern recognition, namely,
Multi-Dimensional Scaling
(MDS) [51]; although unsuitable for indexing, we use it as yardstick for our method. Then, we propose a much faster algorithm to solve the problem in hand, while in addition it allows for indexing. Experiments on real and synthetic data indeed show that the proposed algorithm is significantly faster than MDS, (being linear, as opposed to quadratic, on the database size
N
), while it manages to preserve distances and the overall structure of the data-set.
Publisher
Association for Computing Machinery (ACM)
Subject
Information Systems,Software
Reference55 articles.
1. Mining association rules between sets of items in large databases
2. Basic local alignment search tool
3. a prototype 3-d medical image database system;Arya Manish;IEEE Data Engineering Bulletin,1993
Cited by
248 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献