Affiliation:
1. Hasso-Plattner-Institute, Potsdam, Germany
2. Parallel Computing Lab, Intel Corporation
Abstract
Read-optimized columnar databases use differential updates to handle writes by maintaining a separate write-optimized delta partition which is periodically merged with the read-optimized and compressed main partition. This merge process introduces significant overheads and unacceptable downtimes in update intensive systems, aspiring to combine transactional and analytical workloads into one system.
In the first part of the paper, we report data analyses of 12 SAP Business Suite customer systems. In the second half, we present an optimized merge process reducing the merge overhead of current systems by a factor of 30. Our linear-time merge algorithm exploits the underlying high compute and bandwidth resources of modern multi-core CPUs with architecture-aware optimizations and efficient parallelization. This enables compressed in-memory column stores to handle the transactional update rate required by enterprise applications, while keeping properties of read-optimized databases for analytic-style queries.
Subject
General Earth and Planetary Sciences,Water Science and Technology,Geography, Planning and Development
Cited by
81 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Two Birds With One Stone: Designing a Hybrid Cloud Storage Engine for HTAP;Proceedings of the VLDB Endowment;2024-07
2. Rethinking the Encoding of Integers for Scans on Skewed Data;Proceedings of the ACM on Management of Data;2023-12-08
3. S/C: Speeding up Data Materialization with Bounded Memory;2023 IEEE 39th International Conference on Data Engineering (ICDE);2023-04
4. Data Management and Visual Information Processing using Machine Learning;2022 5th International Conference on Contemporary Computing and Informatics (IC3I);2022-12-14
5. Parallel Maintenance of Materialized Views in Large-Scale Analytic Platforms;International Journal of Organizational and Collective Intelligence;2022-07-21