Affiliation:
1. Aalborg University, Denmark
Abstract
Industrial systems, e.g., wind turbines, generate big amounts of data from reliable sensors with high velocity. As it is unfeasible to store and query such big amounts of data, only simple aggregates are currently stored. However, aggregates remove fluctuations and outliers that can reveal underlying problems and limit the knowledge to be gained from historical data. As a remedy, we present the distributed Time Series Management System (TSMS)
ModelarDB
that uses
models
to store sensor data. We thus propose an online, adaptive multi-model compression algorithm that maintains data values within a user-defined error bound (possibly zero). We also propose (i) a database schema to store time series as models, (ii) methods to push-down predicates to a key-value store utilizing this schema, (iii) optimized methods to execute aggregate queries on models, (iv) a method to optimize execution of projections through static code-generation, and (v) dynamic extensibility that allows new models to be used without recompiling the TSMS. Further, we present a general modular distributed TSMS architecture and its implementation, ModelarDB, as a portable library, using Apache Spark for query processing and Apache Cassandra for storage. An experimental evaluation shows that, unlike current systems, ModelarDB hits a sweet spot and offers fast ingestion, good compression, and fast, scalable online aggregate query processing at the same time. This is achieved by dynamically adapting to data sets using multiple models. The system degrades gracefully as more outliers occur and the actual errors are much lower than the bounds.
Subject
General Earth and Planetary Sciences,Water Science and Technology,Geography, Planning and Development
Cited by
27 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Why Model-Based Lossy Compression is Great for Wind Turbine Analytics;2024 IEEE 40th International Conference on Data Engineering (ICDE);2024-05-13
2. AdaEdge: A Dynamic Compression Selection Framework for Resource Constrained Devices;2024 IEEE 40th International Conference on Data Engineering (ICDE);2024-05-13
3. DeepMapping: Learned Data Mapping for Lossless Compression and Efficient Lookup;2024 IEEE 40th International Conference on Data Engineering (ICDE);2024-05-13
4. Big Data Storage and Analysis System for Space Application;2024 27th International Conference on Computer Supported Cooperative Work in Design (CSCWD);2024-05-08
5. High-performance Effective Scientific Error-bounded Lossy Compression with Auto-tuned Multi-component Interpolation;Proceedings of the ACM on Management of Data;2024-03-12