Affiliation:
1. Imperial College London
Abstract
Single-node multi-core stream processing engines (SPEs) can process hundreds of millions of tuples per second. Yet making them fault-tolerant with exactly-once semantics while retaining this performance is an open challenge: due to the limited I/O bandwidth of a single-node, it becomes infeasible to persist all stream data and operator state during execution. Instead, single-node SPEs rely on upstream distributed systems, such as Apache Kafka, to recover stream data after failure, necessitating complex cluster-based deployments. This lack of built-in fault-tolerance features has hindered the adoption of single-node SPEs.
We describe Scabbard, the first single-node SPE that supports exactly-once fault-tolerance semantics despite limited local I/O bandwidth. Scabbard achieves this by integrating persistence operations with the query workload. Within the operator graph, Scabbard determines when to persist streams based on the selectivity of operators: by persisting streams after operators that discard data, it can substantially reduce the required I/O bandwidth. As part of the operator graph, Scabbard supports parallel persistence operations and uses markers to decide when to discard persisted data. The persisted data volume is further reduced using workload-specific compression: Scabbard monitors stream statistics and dynamically generates computationally efficient compression operators. Our experiments show that Scabbard can execute stream queries that process over 200 million tuples per second while recovering from failures with sub-second latencies.
Publisher
Association for Computing Machinery (ACM)
Subject
General Earth and Planetary Sciences,Water Science and Technology,Geography, Planning and Development
Reference110 articles.
1. Integrating compression and execution in column-oriented database systems
2. MillWheel
3. Amazon. 2021. Amazon Elastic Block Store. https://aws.amazon.com/ebs/. Last access: 28/10/21. Amazon. 2021. Amazon Elastic Block Store. https://aws.amazon.com/ebs/. Last access: 28/10/21.
4. Amazon. 2021. Amazon Kinesis. https://aws.amazon.com/kinesis/data-streams/. Last access: 28/10/21. Amazon. 2021. Amazon Kinesis. https://aws.amazon.com/kinesis/data-streams/. Last access: 28/10/21.
Cited by
5 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. A survey on transactional stream processing;The VLDB Journal;2023-09-27
2. Parallelizing Stream Compression for IoT Applications on Asymmetric Multicores;2023 IEEE 39th International Conference on Data Engineering (ICDE);2023-04
3. CompressStreamDB: Fine-Grained Adaptive Stream Processing without Decompression;2023 IEEE 39th International Conference on Data Engineering (ICDE);2023-04
4. TiLT: A Time-Centric Approach for Stream Query Optimization and Parallelization;Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2;2023-01-27
5. Region-based Sub-Snapshot (RegSnap): Enhanced Fault Tolerance in Distributed Stream Processing with Partial Snapshot;2022 IEEE International Conference on Big Data (Big Data);2022-12-17