Generating Succinct Descriptions of Database Schemata for Cost-Efficient Prompting of Large Language Models-Reference-Cited by-同舟云学术

Generating Succinct Descriptions of Database Schemata for Cost-Efficient Prompting of Large Language Models

Published:2024-07 Issue:11 Volume:17 Page:3511-3523
ISSN:2150-8097
Container-title:Proceedings of the VLDB Endowment
language:en
Short-container-title:Proc. VLDB Endow.

Author:

Trummer Immanuel¹

Affiliation:

1. Cornell University, Ithaca, New York, USA

Abstract

Using large language models (LLMs) for tasks like text-to-SQL translation often requires describing the database schema as part of the model input. LLM providers typically charge as a function of the number of tokens read. Hence, reducing the length of the schema description saves money at each model invocation. This paper introduces Schemonic, a system that automatically finds concise text descriptions of relational database schemata. By introducing abbreviations or grouping schema elements with similar properties, Schemonic typically finds descriptions that use significantly fewer tokens than naive schema representations. Internally, Schemonic models schema compression as a combinatorial optimization problem and uses integer linear programming solvers to find guaranteed optimal or near-optimal solutions. It speeds up optimization by starting optimization from heuristic solutions and reducing the search space size via pre-processing. The experiments on TPC-H, SPIDER, and Public-BI demonstrate that Schemonic reduces schema description length significantly, along with fees for reading them, without reducing the accuracy in tasks such as text-to-SQL translation.

Publisher

Association for Computing Machinery (ACM)

Link

https://dl.acm.org/doi/pdf/10.14778/3681954.3682017

Reference40 articles.

1. Md Adnan Arefeen, Biplob Debnath, and Srimat Chakradhar. 2023. LeanContext: Cost-Efficient Domain-Specific Question Answering Using LLMs. CoRR abs/2309.0 (2023), 1--8. arXiv:2309.00841 http://arxiv.org/abs/2309.00841

2. An integer programming approach for the view and index selection problem

3. SPARTAN

4. A survey of approaches to automatic schema matching

5. Compressing SQL workloads