Efficient Group K Nearest-Neighbor Spatial Query Processing in Apache Spark-Reference-Cited by-同舟云学术

Efficient Group K Nearest-Neighbor Spatial Query Processing in Apache Spark

Published:2021-11-11 Issue:11 Volume:10 Page:763
ISSN:2220-9964
Container-title:ISPRS International Journal of Geo-Information
language:en
Short-container-title:IJGI

Author:

Moutafis Panagiotis^ORCID,Mavrommatis George^ORCID,Vassilakopoulos Michael^ORCID,Corral Antonio^ORCID

Abstract

Aiming at the problem of spatial query processing in distributed computing systems, the design and implementation of new distributed spatial query algorithms is a current challenge. Apache Spark is a memory-based framework suitable for real-time and batch processing. Spark-based systems allow users to work on distributed in-memory data, without worrying about the data distribution mechanism and fault-tolerance. Given two datasets of points (called Query and Training), the group K nearest-neighbor (GKNN) query retrieves (K) points of the Training with the smallest sum of distances to every point of the Query. This spatial query has been actively studied in centralized environments and several performance improving techniques and pruning heuristics have been also proposed, while, a distributed algorithm in Apache Hadoop was recently proposed by our team. Since, in general, Apache Hadoop exhibits lower performance than Spark, in this paper, we present the first distributed GKNN query algorithm in Apache Spark and compare it against the one in Apache Hadoop. This algorithm incorporates programming features and facilities that are specific to Apache Spark. Moreover, techniques that improve performance and are applicable in Apache Spark are also incorporated. The results of an extensive set of experiments with real-world spatial datasets are presented, demonstrating that our Apache Spark GKNN solution, with its improvements, is efficient and a clear winner in comparison to processing this query in Apache Hadoop.

Funder

Ministerio de Economía, Industria y Competitividad, Gobierno de España

Publisher

MDPI AG

Subject

Earth and Planetary Sciences (miscellaneous),Computers in Earth Sciences,Geography, Planning and Development

Link

https://www.mdpi.com/2220-9964/10/11/763/pdf

Reference51 articles.

1. SpatialHadoop: A MapReduce framework for spatial data

2. Spatial data management in apache spark: the GeoSpark perspective and beyond

3. Group nearest neighbor queries

Cited by 4 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. A Novel Query Method for Spatial Database Based on Improved K-Nearest Neighbor Algorithm;International Journal of Decision Support System Technology;2023-10-25

2. Defining and designing spatial queries: the role of spatial relationships;Geo-spatial Information Science;2023-05-17

3. A PID-Based kNN Query Processing Algorithm for Spatial Data;Sensors;2022-10-09

4. Intelligent Measurement of Coal Moisture Based on Microwave Spectrum via Distance-Weighted kNN;Applied Sciences;2022-06-18