DoppelGanger++: Towards Fast Dependency Graph Generation for Database Replay-Reference-Cited by-同舟云学术

DoppelGanger++: Towards Fast Dependency Graph Generation for Database Replay

Published:2024-03-12 Issue:1 Volume:2 Page:1-26
ISSN:2836-6573
Container-title:Proceedings of the ACM on Management of Data
language:en
Short-container-title:Proc. ACM Manag. Data

Author:

Lee Wonseok¹^ORCID,Ha Jaehyun¹^ORCID,Han Wook-Shin¹^ORCID,Park Changgyoo²^ORCID,Park Myunggon²^ORCID,Han Juhyeng²^ORCID,Lee Juchang²^ORCID

Affiliation:

1. POSTECH, Pohang, Republic of Korea

2. SAP Labs Korea, Seoul, Republic of Korea

Abstract

A database replay system (DRS) captures workloads on a production system and then replays them in a test system to test various system changes, avoiding any risk before realizing them in production. The dependency graph generation in a DRS is crucial in preserving output determinism while maximizing concurrency. The state-of-the-art dependency graph generation algorithm deployed in a commercial DBMS uses a generate-and-prune strategy. It first generates a dependency graph by performing backward scans for each request in a workload. It then prunes all redundant edges using an expensive, transitive reduction algorithm. However, we notice that this generates a large dependency graph that contains many redundant edges and its worst-case time complexity is quadratic to the number of requests in a workload. In order to solve these challenging problems, we formally propose four classes of dependency graphs for DRSs. We then present a stateful single forward scan algorithm, SSFS, to generate any class of dependency graphs by performing a single scan over all requests while succinctly maintaining states. Here, states refer to information that is stored and maintained for efficient dependency graph generation. We also propose the parallel SSFS to utilize the computation power with multi-core CPUs while balancing the loads. We implemented our DRS in a leading commercial DBMS. Extensive experiments using the TPC-C, SD benchmarks, and a real-world customer workload show that our DRS significantly improves the dependency graph generation time by up to two orders of magnitude, compared to the state-of-the-art.

Funder

National Research Foundation of Korea

Publisher

Association for Computing Machinery (ACM)

Link

https://dl.acm.org/doi/pdf/10.1145/3639322

Reference39 articles.

1. 2010. TPC Benchmark C. http://www.tpc.org/tpcc/. Accessed: 2022-06--23.

2. 2022. Capturing and Replaying Workloads. https://help.sap.com/docs/SAP_HANA_COCKPIT/ afa922439b204e9caf22c78b6b69e4f2/4f3c88249d484b0faba0e6b27b82c2dd.html?locale=en-US

3. 2022. SAP Standard Application Benchmarks. https://www.sap.com/about/benchmark.html.

4. 2022. Sql server distributed replay. https://learn.microsoft.com/en-us/sql/tools/distributed-replay/sql-server-distributed-replay?view=sql-server-ver16

5. 2023. The Internals of PostgreSQL. https://www.interdb.jp/. Accessed: 2023--10--20.