TriJoin: A Time-Efficient and Scalable Three-Way Distributed Stream Join System
Author:
Shuiying Yu Shuiying Yu,Shuiying Yu Yinting Zheng,Yinting Zheng Fan Zhang,Fan Zhang Hanhua Chen,Hanhua Chen Hai Jin
Abstract
<p>Stream join is one of the most fundamental operations in data stream processing applications. Existing distributed stream join systems can support efficient two-way join, which is a join operation between two streams. Based the two-way join, implementing a three-way join require to be split into double two-way joins, where the second two-way join needs to wait for the join result transmitted from the first two-way join. We show through experiments that such a design raises prohibitively high processing latency. To solve this problem, we propose TriJoin, a time-efficient three-way distributed stream join system. We design a symmetric wait-free structure by symmetrically partitioning tuples and reused join. TriJoin utilizes reused join to join each new tuple with the intermediate result of the other two streams and stored tuples locally. For a new tuple, TriJoin only joins it with the intermediate result to generate the final result without waiting, greatly reducing the processing latency. In TriJoin, we design two partitioning and storage schemes according to two different forms of three-way stream join. We implement TriJoin and conduct comprehensive experiments to evaluate the performance using real-world traces. Results show that TriJoin significantly reduces the processing latency by up to 68%, compared to existing designs.</p>
<p> </p>
Publisher
Angle Publishing Co., Ltd.
Subject
Computer Networks and Communications,Software