Affiliation:
1. Harbin Institute of Technology , Weihai 264209, China
2. Universidad Politécnica de Madrid , Madrid 28031, Spain
Abstract
Abstract
The sketch is one of the typical and widely used data structures for estimating the frequencies of items in data streams. However, since the counter sizes in traditional rectangular sketch (r-sketch) are the same, it is hard to achieve small space usage, high capacity (i.e. the maximum frequency can be recorded) and high estimated accuracy simultaneously. Moreover, when considering the high skewness of data streams, this problem will become even worse. Consequently, we propose the trapezoidal sketch (t-sketch) in this paper. In the t-sketch, different from the r-sketch, the counter sizes in different layers are different. Therefore, the low space usage and high capacity can be achieved simultaneously in the t-sketch. Moreover, based on the basic t-sketch, we propose the space-saving t-sketch and the capacity-improvement t-sketch and analyze the properties of these two t-sketches. Finally, for improving the estimation accuracy of the t-sketch further, we propose the probabilistic-based estimation error-reducing algorithm. Compared with the CM sketch, CU sketch, C sketch and A sketch, the simulation results show that the performances on space usage, capacity and estimation accuracy are improved successfully by the space-saving t-sketch and the capacity-improvement t-sketch.
Publisher
Oxford University Press (OUP)