Abstract
In this paper, we propose the first deterministic algorithms to solve the frequency estimation and frequent item problems in the
bounded-deletion
model. We establish the space lower bound for solving the deterministic frequent items problem in the bounded-deletion model, and propose Lazy SpaceSaving
±
and SpaceSaving
±
algorithms with optimal space bound. We develop an efficient implementation of the SpaceSaving
±
algorithm that minimizes the latency of update operations using novel data structures. The experimental evaluations testify that SpaceSaving
±
has accurate frequency estimations and achieves very high recall and precision across different data distributions while using minimal space. Our experiments clearly demonstrate that, if allowed the same space, SpaceSaving± is more accurate than the state-of-the-art protocols with up to
logU
- 1/
logU
of the items deleted, where
U
is the size of the input universe. Moreover, motivated by prior work, we propose Dyadic SpaceSaving
±
, the first deterministic quantile approximation sketch in the bounded-deletion model.
Publisher
Association for Computing Machinery (ACM)
Subject
General Earth and Planetary Sciences,Water Science and Technology,Geography, Planning and Development
Reference49 articles.
1. [n.d.]. Anonymized Internet Traces 2015 . https://catalog.caida.org/details/dataset/passive_2015_pcap. Accessed: 2021-11-5. [n.d.]. Anonymized Internet Traces 2015. https://catalog.caida.org/details/dataset/passive_2015_pcap. Accessed: 2021-11-5.
2. Mergeable summaries
3. The Space Complexity of Approximating the Frequency Moments
4. An information statistics approach to data stream and communication complexity
5. Designing Heavy-Hitter Detection Algorithms for Programmable Switches
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Scalable Overspeed Item Detection in Streams;2024 IEEE 40th International Conference on Data Engineering (ICDE);2024-05-13