Affiliation:
1. The Chinese University of Hong Kong, Hong Kong, China
Abstract
Timeseries management systems play an important role in IoT and performance monitoring. As the data volume scales up, absorbing data memory efficiently with high throughput becomes a growing requirement for timeseries management systems. However, the designs of the existing systems, especially the in-memory data structures, suffer from two issues. First, they suffer from the trade-off between memory efficiency and performance. Second, they are not scalable because of lock contention where they cannot benefit from parallel insertion and querying.
In this paper, we propose ForestTI, a scalable inverted-index-oriented timeseries management system where the balance point between memory efficiency and performance can be flexibly adjusted under the increasing memory pressure. First, we present a two-level inverted index, which is scalable with optimistic lock coupling, and its internal structure can be gradually converted to more memory efficient representations. Second, we propose a two-level pointer swizzling mechanism to actively swap out the cold posting lists and in-memory timeseries objects as the number of timeseries increases. Finally, we further optimize the on-disk data structures (i.e. write-ahead logs and LSM-tree) to adapt to the high insertion throughput from the in-memory components. We prototype ForestTI with C++ from scratch, and compared to the storage engine of Prometheus, ForestTI achieves 1.79x higher insertion throughput, 52.1% lower query latency, and 56.9% lower memory occupation. We have released the open-source code of ForestTI for public access.
Funder
Direct Grant for Research, The Chinese University of Hong Kong
the Research Grants Council of the Hong Kong Special Administrative Region, China
Publisher
Association for Computing Machinery (ACM)
Reference46 articles.
1. 2022. CPU cores and threads per CPU core per instance type. https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/cpu-options-supported-instances-values.html. 2022. CPU cores and threads per CPU core per instance type. https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/cpu-options-supported-instances-values.html.
2. 2022. Grafana: The open observability platform | Grafana Labs. https://grafana.com/. 2022. Grafana: The open observability platform | Grafana Labs. https://grafana.com/.
3. 2022. mlock(2) - Linux manual page. https://man7.org/linux/man-pages/man2/mlock.2.html. 2022. mlock(2) - Linux manual page. https://man7.org/linux/man-pages/man2/mlock.2.html.
4. 2022. mmap(2) - Linux manual page. https://www.man7.org/linux/man-pages/man2/mmap.2.html. 2022. mmap(2) - Linux manual page. https://www.man7.org/linux/man-pages/man2/mmap.2.html.
5. 2022. Transaction ID Wraparound in Postgres. https://blog.sentry.io/2015/07/23/transaction-id-wraparound-in-postgres. 2022. Transaction ID Wraparound in Postgres. https://blog.sentry.io/2015/07/23/transaction-id-wraparound-in-postgres.