Abstract
AbstractTo ensure critical infrastructure is operating as expected, high-quality sensors are increasingly installed. However, due to the enormous amounts of high-frequency time series they produce, it is impossible or infeasible to transfer or even store these time series in the cloud when using state-of-the-practice compression methods. Thus, simple aggregates, e.g., 1–10-minutes averages, are stored instead of the raw time series. However, by only storing these simple aggregates, informative outliers and fluctuations are lost. Many Time Series Management System (TSMS) have been proposed to efficiently manage time series, but they are generally designed for either the edge or the cloud. In this paper, we describe a new version of the open-source model-based TSMS ModelarDB. The system is designed to be modular and the same binary can be efficiently deployed on the edge and in the cloud. It also supports continuously transferring high-frequency time series compressed using models from the edge to the cloud. We first provide an overview of ModelarDB, analyze the requirements and limitations of the edge, and evaluate existing query engines and data stores for use on the edge. Then, we describe how ModelarDB has been extended to efficiently manage time series on the edge, a novel file-based data store, how ModelarDB’s compression has been improved by not storing time series that can be derived from base time series, and how ModelarDB transfers high-frequency time series from the edge to the cloud. As the work that led to ModelarDB began in 2015, we also reflect on the lessons learned while developing it.
Publisher
Springer Berlin Heidelberg
Reference28 articles.
1. Abuzaid, F., et al.: MacroBase: prioritizing attention in fast data. ACM Trans. Database Syst. 43(4), 1–45 (2018). https://doi.org/10.1145/3276463
2. Adams, C., et al.: Monarch: Google’s planet-scale in-memory time series database. Proc. VLDB Endow. 13(12), 3181–3194 (2020). https://doi.org/10.14778/3181-3194
3. Agrawal, N., Vulimiri, A.: Low-latency analytics on colossal data streams with SummaryStore. In: Proceedings 26th ACM Symposium on Operating System Principles, pp. 647–664. ACM (2017). https://doi.org/10.1145/3132747.3132758
4. Bader, A., Kopp, O., Michael, F.: Survey and comparison of open source time series databases. In: Datenbanksysteme für Business, Technologie und Web - Workshopband, pp. 249–268. GI (2017)
5. Buevich, M., Wright, A., Sargent, R., Rowe, A.: Respawn: a distributed multi-resolution time-series datastore. In: Proceedings of IEEE 34th Real-Time Systems Symposium, pp. 288–297. IEEE (2013). https://doi.org/10.1109/RTSS.2013.36
Cited by
3 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Why Model-Based Lossy Compression is Great for Wind Turbine Analytics;2024 IEEE 40th International Conference on Data Engineering (ICDE);2024-05-13
2. Form-Based Semantic Caching on Time Series;Lecture Notes in Computer Science;2024
3. Machine Learning Platform for Extreme Scale Computing on Compressed IoT Data;2022 IEEE International Conference on Big Data (Big Data);2022-12-17