Affiliation:
1. Harbin Institute of Technology, Harbin, China
Abstract
Stream join is a fundamental operation in stream processing and has attracted extensive research due to its large resource consumption and serious impact on system performance. As the theoretical basis of stream join systems, the stream join model greatly affects system performance. State-of-the-art stream join models either consume too much computing resources or too much storage resources, thus resulting in lower throughput or higher latency. In this paper, we propose a new stream join model for processing arbitrary join predicates, called CoModel, which offers a flexible trade-off between memory and computing resource consumption. More importantly, CoModel can achieve the minimum sum of the number of store operations and join operations among all existing join models, and thus can achieve the lowest latency and highest throughput when the overheads associated with the local stream join for each input tuple are approximately constant. We give a trade-off strategy for CoModel and theoretically prove its performance advantages based on queuing theory. Furthermore, we design and implement an adaptive distributed stream join system, CoStream, based on CoModel. CoStream can adaptively adjust its structure according to resource constraints and statistics of input data. We conduct extensive experiments for CoStream to evaluate its performance and adaptivity, and the results show that CoStream has the lowest latency and highest throughput in various scenarios.
Funder
National Natural Science Foundation of China
Publisher
Association for Computing Machinery (ACM)
Reference39 articles.
1. Photon
2. Apache Flink#8482;: Stream and Batch Processing in a Single Engine;Carbone Paris;IEEE Data Eng. Bull.,2015
3. CLASH
4. Scalable and adaptive online joins
5. A-DSP: An Adaptive Join Algorithm for Dynamic Data Stream on Cloud System