Heterogeneous CPU-GPU Epsilon Grid Joins: Static and Dynamic Work Partitioning Strategies-Reference-Cited by-同舟云学术

Heterogeneous CPU-GPU Epsilon Grid Joins: Static and Dynamic Work Partitioning Strategies

Published:2020-10-21 Issue:1 Volume:6 Page:39-62
ISSN:2364-1185
Container-title:Data Science and Engineering
language:en
Short-container-title:Data Sci. Eng.

Author:

Gallet Benoit^ORCID,Gowanlock Michael

Abstract

AbstractGiven two datasets (or tables) A and B and a search distance

$$\epsilon$$

ϵ , the distance similarity join, denoted as

$$A \ltimes _\epsilon B$$

A ⋉ ϵ B , finds the pairs of points (

$$p_a$$

p a ,

$$p_b$$

p b ), where

$$p_a \in A$$

p a ∈ A and

$$p_b \in B$$

p b ∈ B , and such that the distance between

$$p_a$$

p a and

$$p_b$$

p b is

$$\le \epsilon$$

≤ ϵ . If

$$A = B$$

A = B , then the similarity join is equivalent to a similarity self-join, denoted as

$$A \bowtie _\epsilon A$$

A ⋈ ϵ A . We propose in this paper Heterogeneous Epsilon Grid Joins (HEGJoin), a heterogeneous CPU-GPU distance similarity join algorithm. Efficiently partitioning the work between the CPU and the GPU is a challenge. Indeed, the work partitioning strategy needs to consider the different characteristics and computational throughput of the processors (CPU and GPU), as well as the data-dependent nature of the similarity join that accounts in the overall execution time (e.g., the number of queries, their distribution, the dimensionality, etc.). In addition to HEGJoin, we design in this paper a dynamic and two static work partitioning strategies. We also propose a performance model for each static partitioning strategy to perform the distribution of the work between the processors. We evaluate the performance of all three partitioning methods by considering the execution time and the load imbalance between the CPU and GPU as performance metrics. HEGJoin achieves a speedup of up to

$$5.46\times$$

5.46 × (

$$3.97\times$$

3.97 × ) over the GPU-only (CPU-only) algorithms on our first test platform and up to

$$1.97\times$$

1.97 × (

$$12.07\times$$

12.07 × ) on our second test platform over the GPU-only (CPU-only) algorithms.

Funder

National Science Foundation

Publisher

Springer Science and Business Media LLC