Affiliation:
1. Mahidol University International College
2. IBM Research
3. Meta
Abstract
Sliding-window aggregation is a foundational stream processing primitive that efficiently summarizes recent data. The state-of-the-art algorithms for sliding-window aggregation are highly efficient when stream data items are evicted or inserted one at a time, even when some of the insertions occur out-of-order. However, real-world streams are often not only out-of-order but also bursty, causing data items to be evicted or inserted in larger bulks. This paper introduces a new algorithm for sliding-window aggregation with bulk eviction and bulk insertion. For the special case of single insert and evict, our algorithm matches the theoretical complexity of the best previous out-of-order algorithms. For the case of bulk evict, our algorithm improves upon the theoretical complexity of the best previous algorithm for that case and also outperforms it in practice. For the case of bulk insert, there are no prior algorithms, and our algorithm improves upon the naive approach of emulating bulk insert with a loop over single inserts, both in theory and in practice. Overall, this paper makes high-performance algorithms for sliding window aggregation more broadly applicable by efficiently handling the ubiquitous cases of out-of-order data and bursts.
Publisher
Association for Computing Machinery (ACM)
Subject
General Earth and Planetary Sciences,Water Science and Technology,Geography, Planning and Development
Reference29 articles.
1. 2022. Citi Bike System Data. https://www.citibikenyc.com/system-data . Retrieved December , 2022 . 2022. Citi Bike System Data. https://www.citibikenyc.com/system-data. Retrieved December, 2022.
2. Mergeable summaries
3. MillWheel
4. Learning from Time-Changing Data with Adaptive Windowing
5. Space/time trade-offs in hash coding with allowable errors
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. μWheel: Aggregate Management for Streams and Queries;Proceedings of the 18th ACM International Conference on Distributed and Event-based Systems;2024-06-24