Building Efficient Query Engines in a High-Level Language

Author:

Shaikhha Amir1,Klonatos Yannis1,Koch Christoph1

Affiliation:

1. EPFL, Lausanne, Lausanne, Switzerland

Abstract

Abstraction without regret refers to the vision of using high-level programming languages for systems development without experiencing a negative impact on performance. A database system designed according to this vision offers both increased productivity and high performance instead of sacrificing the former for the latter as is the case with existing, monolithic implementations that are hard to maintain and extend. In this article, we realize this vision in the domain of analytical query processing. We present LegoBase, a query engine written in the high-level programming language Scala. The key technique to regain efficiency is to apply generative programming: LegoBase performs source-to-source compilation and optimizes database systems code by converting the high-level Scala code to specialized, low-level C code. We show how generative programming allows to easily implement a wide spectrum of optimizations, such as introducing data partitioning or switching from a row to a column data layout, which are difficult to achieve with existing low-level query compilers that handle only queries. We demonstrate that sufficiently powerful abstractions are essential for dealing with the complexity of the optimization effort, shielding developers from compiler internals and decoupling individual optimizations from each other. We evaluate our approach with the TPC-H benchmark and show that (a) with all optimizations enabled, our architecture significantly outperforms a commercial in-memory database as well as an existing query compiler. (b) Programmers need to provide just a few hundred lines of high-level code for implementing the optimizations, instead of complicated low-level code that is required by existing query compilation approaches. (c) These optimizations may potentially come at the cost of using more system memory for improved performance. (d) The compilation overhead is low compared to the overall execution time, thus making our approach usable in practice for compiling query engines.

Funder

Google Ph.D. Fellowship, NCCR MARVEL, and ERC

Publisher

Association for Computing Machinery (ACM)

Subject

Information Systems

Reference94 articles.

1. Column-stores vs. row-stores

2. DBToaster

3. Alfred V. Aho Ravi Sethi and Jeffrey D. Ullman. 2007. Compilers: Principles Techniques and Tools. Vol. 2. Addison-Wesley Reading MA. Alfred V. Aho Ravi Sethi and Jeffrey D. Ullman. 2007. Compilers: Principles Techniques and Tools. Vol. 2. Addison-Wesley Reading MA.

Cited by 27 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Incremental Fusion: Unifying Compiled and Vectorized Query Execution;2024 IEEE 40th International Conference on Data Engineering (ICDE);2024-05-13

2. Optimizing Nested Recursive Queries;Proceedings of the ACM on Management of Data;2024-03-12

3. Program generation meets program verification: A case study on number-theoretic transform;Science of Computer Programming;2024-01

4. Mat2Stencil: A Modular Matrix-Based DSL for Explicit and Implicit Matrix-Free PDE Solvers on Structured Grid;Proceedings of the ACM on Programming Languages;2023-10-16

5. Efficient Query Processing in Python Using Compilation;Companion of the 2023 International Conference on Management of Data;2023-06-04

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3