LocationSpark-Reference-Cited by-同舟云学术

LocationSpark

Published:2016-09 Issue:13 Volume:9 Page:1565-1568
ISSN:2150-8097
Container-title:Proceedings of the VLDB Endowment
language:en
Short-container-title:Proc. VLDB Endow.

Author:

Tang Mingjie¹,Yu Yongyang¹,Malluhi Qutaibah M.²,Ouzzani Mourad³,Aref Walid G.¹

Affiliation:

1. Purdue University

2. Qatar University

3. Qatar Computing Research Institute, HBKU

Abstract

We present LocationSpark, a spatial data processing system built on top of Apache Spark, a widely used distributed data processing system. LocationSpark offers a rich set of spatial query operators, e.g., range search, k NN, spatio-textual operation, spatial-join, and k NN-join. To achieve high performance, LocationSpark employs various spatial indexes for in-memory data, and guarantees that immutable spatial indexes have low overhead with fault tolerance. In addition, we build two new layers over Spark, namely a query scheduler and a query executor. The query scheduler is responsible for mitigating skew in spatial queries, while the query executor selects the best plan based on the indexes and the nature of the spatial queries. Furthermore, to avoid unnecessary network communication overhead when processing overlapped spatial data, We embed an efficient spatial Bloom filter into LocationSpark's indexes. Finally, LocationSpark tracks frequently accessed spatial data, and dynamically flushes less frequently accessed data into disk. We evaluate our system on real workloads and demonstrate that it achieves an order of magnitude performance gain over a baseline framework.

Publisher

VLDB Endowment

Subject

General Earth and Planetary Sciences,Water Science and Technology,Geography, Planning and Development

Link

https://dl.acm.org/doi/pdf/10.14778/3007263.3007310

Cited by 117 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. CUPID: An efficient spatio-temporal data engine;Future Generation Computer Systems;2024-12

2. Three-dimensional Geospatial Interlinking with JedAI-spatial;Journal of Web Semantics;2024-07

3. GridMesa: A NoSQL-based big spatial data management system with an adaptive grid approximation model;Future Generation Computer Systems;2024-06

4. RayJoin: Fast and Precise Spatial Join;Proceedings of the 38th ACM International Conference on Supercomputing;2024-05-30

5. TMan: A High-Performance Trajectory Data Management System Based on Key-Value Stores;2024 IEEE 40th International Conference on Data Engineering (ICDE);2024-05-13