Abstract
AbstractGiven two datasets (or tables) A and B and a search distance $$\epsilon$$
ϵ
, the distance similarity join, denoted as $$A \ltimes _\epsilon B$$
A
⋉
ϵ
B
, finds the pairs of points ($$p_a$$
p
a
, $$p_b$$
p
b
), where $$p_a \in A$$
p
a
∈
A
and $$p_b \in B$$
p
b
∈
B
, and such that the distance between $$p_a$$
p
a
and $$p_b$$
p
b
is $$\le \epsilon$$
≤
ϵ
. If $$A = B$$
A
=
B
, then the similarity join is equivalent to a similarity self-join, denoted as $$A \bowtie _\epsilon A$$
A
⋈
ϵ
A
. We propose in this paper Heterogeneous Epsilon Grid Joins (HEGJoin), a heterogeneous CPU-GPU distance similarity join algorithm. Efficiently partitioning the work between the CPU and the GPU is a challenge. Indeed, the work partitioning strategy needs to consider the different characteristics and computational throughput of the processors (CPU and GPU), as well as the data-dependent nature of the similarity join that accounts in the overall execution time (e.g., the number of queries, their distribution, the dimensionality, etc.). In addition to HEGJoin, we design in this paper a dynamic and two static work partitioning strategies. We also propose a performance model for each static partitioning strategy to perform the distribution of the work between the processors. We evaluate the performance of all three partitioning methods by considering the execution time and the load imbalance between the CPU and GPU as performance metrics. HEGJoin achieves a speedup of up to $$5.46\times$$
5.46
×
($$3.97\times$$
3.97
×
) over the GPU-only (CPU-only) algorithms on our first test platform and up to $$1.97\times$$
1.97
×
($$12.07\times$$
12.07
×
) on our second test platform over the GPU-only (CPU-only) algorithms.
Funder
National Science Foundation
Publisher
Springer Science and Business Media LLC
Subject
Computer Science Applications,Computational Mechanics
Cited by
6 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献