Comparing synopsis techniques for approximate spatial data analysis-Reference-Cited by-同舟云学术

Comparing synopsis techniques for approximate spatial data analysis

Published:2019-07 Issue:11 Volume:12 Page:1583-1596
ISSN:2150-8097
Container-title:Proceedings of the VLDB Endowment
language:en
Short-container-title:Proc. VLDB Endow.

Author:

Siddique A. B.¹,Eldawy Ahmed¹,Hristidis Vagelis¹

Affiliation:

1. University of California

Abstract

The increasing amount of spatial data calls for new scalable query processing techniques. One of the techniques that are getting attention is data synopsis , which summarizes the data using samples or histograms and computes an approximate answer based on the synopsis. This general technique is used in selectivity estimation, clustering, partitioning, load balancing, and visualization, among others. This paper experimentally studies four spatial data synopsis techniques for three common data analysis problems, namely, selectivity estimation, k-means clustering, and spatial partitioning. We run an extensive experimental evaluation on both real and synthetic datasets of up to 2.7 billion records to study the trade-offs between the synopsis methods and their applicability in big spatial data analysis. For each of the three problems, we compare with baseline techniques that operate on the whole dataset and evaluate the synopsis generation time, the time for computing an approximate answer on the synopsis, and the accuracy of the result. We present our observations about when each synopsis technique performs best.

Publisher

VLDB Endowment

Subject

General Earth and Planetary Sciences,Water Science and Technology,Geography, Planning and Development

Link

https://dl.acm.org/doi/pdf/10.14778/3342263.3342635

Cited by 12 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Assessing water quality of kazerun county in southwest Iran: Multi-analytical techniques, deterministic vs. probabilistic water quality index, geospatial analysis, fuzzy C-means clustering, and machine learning;Groundwater for Sustainable Development;2024-11

2. Cluster based similarity extraction upon distributed datasets;Cluster Computing;2023-08-25

3. SynopsisDB: Distributed Synopsis-based Data Processing System;Companion of the 2023 International Conference on Management of Data;2023-06-04

4. Beast;Proceedings of the 30th ACM International Conference on Information & Knowledge Management;2021-10-26

5. HQ-Filter: Hierarchy-Aware Filter For Empty-Resulting Queries in Interactive Exploration;2021 22nd IEEE International Conference on Mobile Data Management (MDM);2021-06