Epsilon grid order-Reference-Cited by-同舟云学术

Epsilon grid order

Published:2001-06 Issue:2 Volume:30 Page:379-388
ISSN:0163-5808
Container-title:ACM SIGMOD Record
language:en
Short-container-title:SIGMOD Rec.

Author:

Böhm Christian¹,Braunmüller Bernhard¹,Krebs Florian¹,Kriegel Hans-Peter¹

Affiliation:

1. Institute for Computer Science, University of Munich, Oettingenstr. 67, D-80538 München, Germany

Abstract

The similarity join is an important database primitive which has been successfully applied to speed up applications such as similarity search, data analysis and data mining. The similarity join combines two point sets of a multidimensional vector space such that the result contains all point pairs where the distance does not exceed a parameter ε. In this paper, we propose the Epsilon Grid Order, a new algorithm for determining the similarity join of very large data sets. Our solution is based on a particular sort order of the data points, which is obtained by laying an equi-distant grid with cell length ε over the data space and comparing the grid cells lexicographically. A typical problem of grid-based approaches such as MSJ or the ε-kdB-tree is that large portions of the data sets must be held simultaneously in main memory. Therefore, these approaches do not scale to large data sets. Our technique avoids this problem by an external sorting algorithm and a particular scheduling strategy during the join phase. In the experimental evaluation, a substantial improvement over competitive techniques is shown.

Publisher

Association for Computing Machinery (ACM)

Subject

Information Systems,Software

Link

https://dl.acm.org/doi/pdf/10.1145/376284.375714

Reference30 articles.

1. OPTICS

2. A cost model for nearest neighbor search in high-dimensional data space

Cited by 35 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Fast knowledge graph completion using graphics processing units;Journal of Parallel and Distributed Computing;2024-08

2. Similarity joins and clustering for SPARQL;Semantic Web;2024-03-06

3. Survey on Exact kNN Queries over High-Dimensional Data Space;Sensors;2023-01-05

4. Heterogeneous CPU-GPU Epsilon Grid Joins: Static and Dynamic Work Partitioning Strategies;Data Science and Engineering;2020-10-21

5. A coordinate-oblivious index for high-dimensional distance similarity searches on the GPU;Proceedings of the 34th ACM International Conference on Supercomputing;2020-06-29