On random sampling over joins-Reference-Cited by-同舟云学术

On random sampling over joins

Published:1999-06 Issue:2 Volume:28 Page:263-274
ISSN:0163-5808
Container-title:ACM SIGMOD Record
language:en
Short-container-title:SIGMOD Rec.

Author:

Chaudhuri Surajit¹,Motwani Rajeev²,Narasayya Vivek¹

Affiliation:

1. Microsoft Research

2. Stanford University

Abstract

A major bottleneck in implementing sampling as a primitive relational operation is the inefficiency of sampling the output of a query. It is not even known whether it is possible to generate a sample of a join tree without first evaluating the join tree completely. We undertake a detailed study of this problem and attempt to analyze it in a variety of settings. We present theoretical results explaining the difficulty of this problem and setting limits on the efficiency that can be achieved. Based on new insights into the interaction between join and sampling, we develop join sampling techniques for the settings where our negative results do not apply. Our new sampling algorithms are significantly more efficient than those known earlier. We present experimental evaluation of our techniques on Microsoft's SQL Server 7.0.

Publisher

Association for Computing Machinery (ACM)

Subject

Information Systems,Software

Link

https://dl.acm.org/doi/pdf/10.1145/304181.304206

Reference13 articles.

1. Random sampling for histogram construction

2. Bifocal sampling for skew-resistant join size estimation

3. On the relative cost of sampling for join selectivity estimation

4. Online aggregation

5. Error-constrained COUNT query evaluation in relational databases

Cited by 109 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Reservoir Sampling over Joins;Proceedings of the ACM on Management of Data;2024-05-29

2. Learning manifolds from non-stationary streams;Journal of Big Data;2024-03-23

3. Plexus;Proceedings of the 2023 ACM Symposium on Cloud Computing;2023-10-30

4. ALECE: An Attention-based Learned Cardinality Estimator for SPJ Queries on Dynamic Workloads;Proceedings of the VLDB Endowment;2023-10

5. ShadowAQP: Efficient Approximate Group-by and Join Query via Attribute-Oriented Sample Size Allocation and Data Generation;Proceedings of the VLDB Endowment;2023-09