Affiliation:
1. University of Chinese Academy of Sciences
2. Peking University
Abstract
Computing per-key aggregation is indispensable in streaming data analysis formulated as two phases, an update phase and a recovery phase. As the size and speed of data streams rise, accurate per-key information is useful in many applications like anomaly detection, attack prevention, and online diagnosis. Even though many algorithms have been proposed for per-key aggregation in stream processing, their accuracy guarantees only cover a small portion of keys. In this paper, we aim to achieve nearly full accuracy with limited resource usage. We follow the line of sketch-based techniques. We observe that existing methods suffer from high errors for most keys. The reason is that they track keys by complicated mechanism in the update phase and simply calculate per-key aggregation from some specific counter in the recovery phase. Therefore, we present PR-Sketch, a novel sketching design to address the two limitations. PR-Sketch builds linear equations between counter values and per-key aggregations to improve accuracy, and records keys in the recovery phase to reduce resource usage in the update phase. We also provide an extension called fast PR-Sketch to improve processing rate further. We derive space complexity, time complexity, and guaranteed error probability for both PR-Sketch and fast PR-Sketch. We conduct trace-driven experiments under 100K keys and 1M items to compare our algorithms with multiple state-of-the-art methods. Results demonstrate the resource efficiency and nearly full accuracy of our algorithms.
Subject
General Earth and Planetary Sciences,Water Science and Technology,Geography, Planning and Development
Cited by
14 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. SatShield: In-Network Mitigation of Link Flooding Attacks for LEO Constellation Networks;IEEE Internet of Things Journal;2024-08-15
2. Lightweight Acquisition and Ranging of Flows in the Data Plane;ACM SIGMETRICS Performance Evaluation Review;2024-06-11
3. Lightweight Acquisition and Ranging of Flows in the Data Plane;Abstracts of the 2024 ACM SIGMETRICS/IFIP PERFORMANCE Joint International Conference on Measurement and Modeling of Computer Systems;2024-06-10
4. DISCO: A Dynamically Configurable Sketch Framework in Skewed Data Streams;2024 IEEE 40th International Conference on Data Engineering (ICDE);2024-05-13
5. CodingSketch: A Hierarchical Sketch with Efficient Encoding and Recursive Decoding;2024 IEEE 40th International Conference on Data Engineering (ICDE);2024-05-13