DoppelGanger++: Towards Fast Dependency Graph Generation for Database Replay

Author:

Lee Wonseok1ORCID,Ha Jaehyun1ORCID,Han Wook-Shin1ORCID,Park Changgyoo2ORCID,Park Myunggon2ORCID,Han Juhyeng2ORCID,Lee Juchang2ORCID

Affiliation:

1. POSTECH, Pohang, Republic of Korea

2. SAP Labs Korea, Seoul, Republic of Korea

Abstract

A database replay system (DRS) captures workloads on a production system and then replays them in a test system to test various system changes, avoiding any risk before realizing them in production. The dependency graph generation in a DRS is crucial in preserving output determinism while maximizing concurrency. The state-of-the-art dependency graph generation algorithm deployed in a commercial DBMS uses a generate-and-prune strategy. It first generates a dependency graph by performing backward scans for each request in a workload. It then prunes all redundant edges using an expensive, transitive reduction algorithm. However, we notice that this generates a large dependency graph that contains many redundant edges and its worst-case time complexity is quadratic to the number of requests in a workload. In order to solve these challenging problems, we formally propose four classes of dependency graphs for DRSs. We then present a stateful single forward scan algorithm, SSFS, to generate any class of dependency graphs by performing a single scan over all requests while succinctly maintaining states. Here, states refer to information that is stored and maintained for efficient dependency graph generation. We also propose the parallel SSFS to utilize the computation power with multi-core CPUs while balancing the loads. We implemented our DRS in a leading commercial DBMS. Extensive experiments using the TPC-C, SD benchmarks, and a real-world customer workload show that our DRS significantly improves the dependency graph generation time by up to two orders of magnitude, compared to the state-of-the-art.

Funder

National Research Foundation of Korea

Publisher

Association for Computing Machinery (ACM)

Reference39 articles.

1. 2010. TPC Benchmark C. http://www.tpc.org/tpcc/. Accessed: 2022-06--23.

2. 2022. Capturing and Replaying Workloads. https://help.sap.com/docs/SAP_HANA_COCKPIT/ afa922439b204e9caf22c78b6b69e4f2/4f3c88249d484b0faba0e6b27b82c2dd.html?locale=en-US

3. 2022. SAP Standard Application Benchmarks. https://www.sap.com/about/benchmark.html.

4. 2022. Sql server distributed replay. https://learn.microsoft.com/en-us/sql/tools/distributed-replay/sql-server-distributed-replay?view=sql-server-ver16

5. 2023. The Internals of PostgreSQL. https://www.interdb.jp/. Accessed: 2023--10--20.

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3