Affiliation:
1. Information Sciences Research Center, Bell Laboratories
2. Department of Computer Science, Tel-Aviv University
Abstract
In large data recording and warehousing environments, it is often advantageous to provide fast, approximate answers to queries, whenever possible. Before DBMSs providing highly-accurate approximate answers can become a reality, many new techniques for summarizing data and for estimating answers from summarized data must be developed. This paper introduces two new sampling-based summary statistics, concise samples and counting samples, and presents new techniques for their fast incremental maintenance regardless of the data distribution. We quantify their advantages over standard sample views in terms of the number of additional sample points for the same view size, and hence in providing more accurate query answers. Finally, we consider their application to providing fast approximate answers to hot list queries. Our algorithms maintain their accuracy in the presence of ongoing insertions to the data warehouse.
Publisher
Association for Computing Machinery (ACM)
Subject
Information Systems,Software
Reference34 articles.
1. The space complexity of approximating the frequency moments
2. Query processing and optimization in Oracle Rdb
3. The New jersey data reduction report;Barbar~i D.;Bulletin of the Technical Committee on Data Engineering,1997
Cited by
79 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献