MOST: Model-Based Compression with Outlier Storage for Time Series Data-Reference-Cited by-同舟云学术

MOST: Model-Based Compression with Outlier Storage for Time Series Data

Published:2023-12-08 Issue:4 Volume:1 Page:1-29
ISSN:2836-6573
Container-title:Proceedings of the ACM on Management of Data
language:en
Short-container-title:Proc. ACM Manag. Data

Author:

Yang Zehai¹^ORCID,Chen Shimin¹^ORCID

Affiliation:

1. Institute of Computing Technology, CAS & University of Chinese Academy of Sciences, Beijing, China

Abstract

Time series data are used in a wide variety of applications. The explosive growth of the amount of time series data poses a significant challenge in efficient data storage and query processing. Unfortunately, existing compression techniques either show only low to medium compression ratio on time series data, or incur significant decompression overhead during query processing. We propose a novel compression technique, MOST (Model-based compression with Outlier STorage) for time series data. As measurement values often change smoothly in a period of time, we divide a time series into segments of smooth changes, then compute a linear model for each segment. Since tiny errors are often acceptable in analysis tasks, we omit data points whose computed values are within a pre-specified error threshold from the actual values, thereby effectively reducing the data size. Outliers are rare but important for many applications, and therefore we store outliers explicitly. Moreover, for processing MOST compressed data, we propose a segment-outlier dual-mode query engine that computes segments as a whole as much as possible, and build a prototype MostDB. Experimental results on real-world data sets show that MOST achieves 9.45-15.04x compression ratios. Compared to existing time series databases, MostDB achieves up to 11.68x speedups for common queries from the IoTDB Benchmark.

Publisher

Association for Computing Machinery (ACM)

Link

https://dl.acm.org/doi/pdf/10.1145/3626737

Reference94 articles.

1. BlinkDB

2. Bikash Agrawal . 2013. Analysis of large time-series data in OpenTSDB. Master's thesis . University of Stavanger , Norway. Bikash Agrawal. 2013. Analysis of large time-series data in OpenTSDB. Master's thesis. University of Stavanger, Norway.

3. Brotli

4. Detecting distance-based outliers in streams of data

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Flexible grouping of linear segments for highly accurate lossy compression of time series data;The VLDB Journal;2024-07-15