Hierarchical synopses with optimal error guarantees

Author:

Karras Panagiotis1,Mamoulis Nikos2

Affiliation:

1. National University of Singapore, Law Link, Singapore

2. University of Hong Kong, Pokfulam Road, Hong Kong

Abstract

Hierarchical synopsis structures offer a viable alternative in terms of efficiency and flexibility in relation to traditional summarization techniques such as histograms. Previous research on such structures has mostly focused on a single model, based on the Haar wavelet decomposition. In previous work, we have introduced a more refined, wavelet-inspired hierarchical index structure for synopsis construction: the Haar + tree. The chief advantages of this structure are twofold. First, it achieves higher synopsis quality at the task of summarizing data sets with sharp discontinuities than state-of-the-art histogram and Haar wavelet techniques. Second, thanks to its search space delimitation capacity, Haar + synopsis construction operates in time linear in the size of the data set for any monotonic distributive error metric. Contemporaneous research has introduced another hierarchical synopsis structure, the compact hierarchical histogram (CHH). In this article, we elaborate on both these structures. First, we formally prove that the CHH, in its default binary-hierarchy form, is a simplified variant of a Haar + tree. We then focus on the summarization problem, with both these hierarchical synopsis structures, in which an error guarantee expressed by a maximum-error metric is required. We show that this problem is most efficiently solved through its dual, space-minimization counterpart, which can also achieve optimal quality . In this case, there is a benefit to be gained by specializing the algorithm for each structure; hence, our algorithm for optimal-quality maximum-error CHH requires low polynomial time; on the other hand, optimal-quality Haar + synopses for maximum-error metrics are constructed in exponential time; hence, we also develop a low-polynomial-time approximation scheme for the maximum-error Haar + case. Furthermore, we extend our approach for both general-error and maximum-error Haar + synopses to arbitrary dimensionality. In our experimental study, (i) we confirm the theoretically expected superiority of Haar + synopses over Haar wavelet methods in both construction time and achieved quality for representative error metrics; (ii) we demonstrate that Haar + synopses are also constructed faster than optimal plain histograms, and, moreover, achieve higher synopsis quality with highly discontinuous data sets; such an advantage of a hierarchical synopsis structure over a histogram had been intuitively expressed, but never experimentally verified; and (iii) we show that Haar + synopsis quality supersedes that of a CHH.

Funder

Research Grants Council, University Grants Committee, Hong Kong

Publisher

Association for Computing Machinery (ACM)

Subject

Information Systems

Cited by 20 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. k -Best Egalitarian Stable Marriages for Task Assignment;Proceedings of the VLDB Endowment;2023-07

2. Adaptive Indexing in High-Dimensional Metric Spaces;Proceedings of the VLDB Endowment;2023-06

3. Atrapos: Real-time Evaluation of Metapath Query Workloads;Proceedings of the ACM Web Conference 2023;2023-04-30

4. Marigold: Efficientk-Means Clustering in High Dimensions;Proceedings of the VLDB Endowment;2023-03

5. GRASP: Scalable Graph Alignment by Spectral Corresponding Functions;ACM Transactions on Knowledge Discovery from Data;2023-02-24

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3