Affiliation:
1. Institute of Computing Technology, CAS & University of Chinese Academy of Sciences, Beijing, China
Abstract
Time series data are used in a wide variety of applications. The explosive growth of the amount of time series data poses a significant challenge in efficient data storage and query processing. Unfortunately, existing compression techniques either show only low to medium compression ratio on time series data, or incur significant decompression overhead during query processing.
We propose a novel compression technique, MOST (Model-based compression with Outlier STorage) for time series data. As measurement values often change smoothly in a period of time, we divide a time series into segments of smooth changes, then compute a linear model for each segment. Since tiny errors are often acceptable in analysis tasks, we omit data points whose computed values are within a pre-specified error threshold from the actual values, thereby effectively reducing the data size. Outliers are rare but important for many applications, and therefore we store outliers explicitly. Moreover, for processing MOST compressed data, we propose a segment-outlier dual-mode query engine that computes segments as a whole as much as possible, and build a prototype MostDB. Experimental results on real-world data sets show that MOST achieves 9.45-15.04x compression ratios. Compared to existing time series databases, MostDB achieves up to 11.68x speedups for common queries from the IoTDB Benchmark.
Publisher
Association for Computing Machinery (ACM)
Reference94 articles.
1. BlinkDB
2. Bikash Agrawal . 2013. Analysis of large time-series data in OpenTSDB. Master's thesis . University of Stavanger , Norway. Bikash Agrawal. 2013. Analysis of large time-series data in OpenTSDB. Master's thesis. University of Stavanger, Norway.
3. Brotli
4. Detecting distance-based outliers in streams of data
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献