GPTuner: A Manual-Reading Database Tuning System via GPT-Guided Bayesian Optimization-Reference-Cited by-同舟云学术

GPTuner: A Manual-Reading Database Tuning System via GPT-Guided Bayesian Optimization

Published:2024-04 Issue:8 Volume:17 Page:1939-1952
ISSN:2150-8097
Container-title:Proceedings of the VLDB Endowment
language:en
Short-container-title:Proc. VLDB Endow.

Author:

Lao Jiale¹,Wang Yibo¹,Li Yufei¹,Wang Jianping²,Zhang Yunjia³,Cheng Zhiyuan⁴,Chen Wanghu²,Tang Mingjie¹,Wang Jianguo⁴

Affiliation:

1. Sichuan University

2. Northwest Normal University

3. University of Wisconsin-Madison

4. Purdue University

Abstract

Modern database management systems (DBMS) expose hundreds of configurable knobs to control system behaviours. Determining the appropriate values for these knobs to improve DBMS performance is a long-standing problem in the database community. As there is an increasing number of knobs to tune and each knob could be in continuous or categorical values, manual tuning becomes impractical. Recently, automatic tuning systems using machine learning methods have shown great potentials. However, existing approaches still incur significant tuning costs or only yield sub-optimal performance. This is because they either ignore the extensive domain knowledge available (e.g., DBMS manuals and forum discussions) and only rely on the runtime feedback of benchmark evaluations to guide the optimization, or they utilize the domain knowledge in a limited way. Hence, we propose GPTuner, a manual-reading database tuning system that leverages domain knowledge extensively and automatically to optimize search space and enhance the runtime feedback-based optimization process. Firstly, we develop a Large Language Model (LLM)-based pipeline to collect and refine heterogeneous knowledge, and propose a prompt ensemble algorithm to unify a structured view of the refined knowledge. Secondly, using the structured knowledge, we (1) design a workload-aware and training-free knob selection strategy, (2) develop a search space optimization technique considering the value range of each knob, and (3) propose a Coarse-to-Fine Bayesian Optimization Framework to explore the optimized space. Finally, we evaluate GPTuner under different benchmarks (TPC-C and TPC-H), metrics (throughput and latency) as well as DBMS (PostgreSQL and MySQL). Compared to the state-of-the-art approaches, GPTuner identifies better configurations in 16x less time on average. Moreover, GPTuner achieves up to 30% performance improvement (higher throughput or lower latency) over the best-performing alternative.

Publisher

Association for Computing Machinery (ACM)

Link

https://dl.acm.org/doi/pdf/10.14778/3659437.3659449

Reference68 articles.

1. 2024. GPTuner: full version. https://github.com/SolidLao/GPTuner/blob/main/gptuner-technical-report.pdf

2. Toufique Ahmed and Premkumar Devanbu. 2023. Few-shot training LLMs for project-specific code-summarization. In Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering (ASE '22). Association for Computing Machinery, New York, NY, USA, Article 177, 5 pages. 10.1145/3551349.3559555

3. Sihem Amer-Yahia Angela Bonifati Lei Chen Guoliang Li Kyuseok Shim Jianliang Xu and Xiaochun Yang. 2023. From Large Language Models to Databases and Back: A discussion on research and education. arXiv:2306.01388 [cs.DB]

4. Jason Ansel, Shoaib Kamil, Kalyan Veeramachaneni, Jonathan Ragan-Kelley, Jeffrey Bosboom, Una-May O'Reilly, and Saman Amarasinghe. 2014. OpenTuner: An Extensible Framework for Program Autotuning. In Proceedings of the 23rd International Conference on Parallel Architectures and Compilation (Edmonton, AB, Canada) (PACT '14). Association for Computing Machinery, New York, NY, USA, 303--316. 10.1145/2628071.2628092

5. Simran Arora Avanika Narayan Mayee F. Chen Laurel Orr Neel Guha Kush Bhatia Ines Chami Frederic Sala and Christopher Ré. 2022. Ask Me Anything: A simple strategy for prompting language models. arXiv:2210.02441 [cs.CL]