Massive scale-out of expensive continuous queries-Reference-Cited by-同舟云学术

Massive scale-out of expensive continuous queries

Published:2011-08 Issue:11 Volume:4 Page:1181-1188
ISSN:2150-8097
Container-title:Proceedings of the VLDB Endowment
language:en
Short-container-title:Proc. VLDB Endow.

Author:

Zeitler Erik¹,Risch Tore¹

Affiliation:

1. Uppsala University

Abstract

Scalable execution of expensive continuous queries over massive data streams requires input streams to be split into parallel sub-streams. The query operators are continuously executed in parallel over these sub-streams. Stream splitting involves both partitioning and replication of incoming tuples, depending on how the continuous query is parallelized. We provide a stream splitting operator that enables such customized stream splitting. However, it is critical that the stream splitting itself keeps up with input streams of high volume. This is a problem when the stream splitting predicates have some costs. Therefore, to enable customized splitting of high-volume streams, we introduce a parallelized stream splitting operator, called parasplit. We investigate the performance of parasplit using a cost model and experimentally. Based on these results, a heuristic is devised to automatically parallelize the execution of parasplit. We show that the maximum stream rate of parasplit is network bound, and that the parallelization is energy efficient. Finally, the scalability of our approach is experimentally demonstrated on the Linear Road Benchmark, showing an order of magnitude higher stream processing rate over previously published results, allowing at least 512 expressways.

Publisher

VLDB Endowment

Subject

General Earth and Planetary Sciences,Water Science and Technology,Geography, Planning and Development

Link

https://dl.acm.org/doi/pdf/10.14778/3402707.3402752

Cited by 17 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Monitoring Big Data Streams Using Data Stream Management Systems: Industrial Needs, Challenges, and Improvements;Advances in Operations Research;2023-06-27

2. And synopses for all: A synopses data engine for extreme scale analytics-as-a-service;Information Systems;2023-06

3. SASPAR: Shared Adaptive Stream Partitioning;2023 IEEE 39th International Conference on Data Engineering (ICDE);2023-04

4. Jarvis: Large-scale Server Monitoring with Adaptive Near-data Processing;2022 IEEE 38th International Conference on Data Engineering (ICDE);2022-05

5. Predictive topology refinements in distributed stream processing system;PLOS ONE;2020-11-05