Analyzing efficient stream processing on modern hardware-Reference-Cited by-同舟云学术

Analyzing efficient stream processing on modern hardware

Published:2019-01 Issue:5 Volume:12 Page:516-530
ISSN:2150-8097
Container-title:Proceedings of the VLDB Endowment
language:en
Short-container-title:Proc. VLDB Endow.

Author:

Zeuch Steffen¹,Monte Bonaventura Del¹,Karimov Jeyhun¹,Lutz Clemens¹,Renz Manuel¹,Traub Jonas²,Breß Sebastian³,Rabl Tilmann³,Markl Volker³

Affiliation:

1. German Research Center for Artificial Intelligence

2. Technische Universität Berlin

3. Technische Universität Berlin and German Research Center for Artificial Intelligence

Abstract

Modern Stream Processing Engines (SPEs) process large data volumes under tight latency constraints. Many SPEs execute processing pipelines using message passing on shared-nothing architectures and apply a partition-based scale-out strategy to handle high-velocity input streams. Furthermore, many state-of-the-art SPEs rely on a Java Virtual Machine to achieve platform independence and speed up system development by abstracting from the underlying hardware. In this paper, we show that taking the underlying hardware into account is essential to exploit modern hardware efficiently. To this end, we conduct an extensive experimental analysis of current SPEs and SPE design alternatives optimized for modern hardware. Our analysis highlights potential bottlenecks and reveals that state-of-the-art SPEs are not capable of fully exploiting current and emerging hardware trends, such as multi-core processors and high-speed networks. Based on our analysis, we describe a set of design changes to the common architecture of SPEs to scale-up on modern hardware. We show that the single-node throughput can be increased by up to two orders of magnitude compared to state-of-the-art SPEs by applying specialized code generation, fusing operators, batch-style parallelization strategies, and optimized windowing. This speedup allows for deploying typical streaming applications on a single or a few nodes instead of large clusters.

Publisher

VLDB Endowment

Subject

General Earth and Planetary Sciences,Water Science and Technology,Geography, Planning and Development

Link

https://dl.acm.org/doi/pdf/10.14778/3303753.3303758

Cited by 70 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. The Renoir Dataflow Platform: Efficient Data Processing without Complexity;Future Generation Computer Systems;2024-11

2. RR-Compound: RDMA-Fused gRPC for Low Latency, High Throughput, and Easy Interface;IEEE Transactions on Parallel and Distributed Systems;2024-08

3. NebulaStream - Data Stream Processing in Massively Distributed, Heterogeneous, Volatile Environments;Proceedings of the 18th ACM International Conference on Distributed and Event-based Systems;2024-06-24

4. Benchmarking scalability of stream processing frameworks deployed as microservices in the cloud;Journal of Systems and Software;2024-02

5. Optimising Queries for Pattern Detection Over Large Scale Temporally Evolving Graphs;IEEE Access;2024