Affiliation:
1. Tsinghua University, China, Beijing, China
Abstract
Columnar storage is now an industry standard design in most open-source or commercial time series database products, making them HTAP systems. The time column of a time series serves as the key for identifying the other value column, namely single-column storage scheme. When multiple time series share a similar set of timestamps, very likely in a module of multiple sensors, it is natural to group them together, i.e., one time column identifies multiple value columns in a single-group storage scheme. While multiple value columns sharing the same time column reduce the space cost of repeating timestamps, it may introduce extra space cost for recording null values. The reason is that time series may not be exactly aligned on each timestamp, owing to missing values, distinct data collection frequencies, unsynchronized clocks and so on. The columngroups storage scheme is thus to divide columns into multiple groups, within which the value columns share the same time column. Unfortunately, the problem of finding the optimal column groups for the minimum space cost is highly challenging, NP-hard according to our analysis. Thereby, we propose a heuristic algorithm for automatically grouping time series for efficient columnar storage. The column groups storage has been deployed in Apache IoTDB, an open-source time series database. The extensive performance analysis, over real-world data from our industrial partners, demonstrates that the proposed column groups achieve near optimal storage, more concise than the storage of single-column or single-group schemes. Interestingly, both the flushing and querying time costs of column groups are comparable to those of single-column or singlegroup, i.e., without incurring extra time cost.
Funder
National Key Research and Development Plan
Publisher
Association for Computing Machinery (ACM)
Reference36 articles.
1. Apache HBase. https://hbase.apache.org/. Apache HBase. https://hbase.apache.org/.
2. Apache HBase Implementation. https://github.com/iotdbColumnGroup/HBase. Apache HBase Implementation. https://github.com/iotdbColumnGroup/HBase.
3. Apache IoTDB. http://iotdb.apache.org. Apache IoTDB. http://iotdb.apache.org.
4. Appendix. https://iotdbcolumngroup.github.io/iotdbColumnGroup/appendix.pdf. Appendix. https://iotdbcolumngroup.github.io/iotdbColumnGroup/appendix.pdf.
5. Code and Data. https://github.com/iotdbColumnGroup/iotdbColumnGroup. Code and Data. https://github.com/iotdbColumnGroup/iotdbColumnGroup.
Cited by
4 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献