Progressive Partitioning for Parallelized Query Execution in Google's Napa-Reference-Cited by-同舟云学术

Progressive Partitioning for Parallelized Query Execution in Google's Napa

Published:2023-08 Issue:12 Volume:16 Page:3475-3487
ISSN:2150-8097
Container-title:Proceedings of the VLDB Endowment
language:en
Short-container-title:Proc. VLDB Endow.

Author:

Tatemura Junichi¹,Zou Tao¹,Sankaranarayanan Jagan¹,Huang Yanlai¹,Chen Jim¹,Zhang Yupu¹,Lai Kevin¹,Zhang Hao¹,Manoharan Gokul Nath Babu¹,Graefe Goetz¹,Agrawal Divyakant¹,Adelberg Brad¹,Kolhar Shilpa¹,Roy Indrajit¹

Affiliation:

1. Google Inc

Abstract

Napa holds Google's critical data warehouses in log-structured merge trees for real-time data ingestion and sub-second response for billions of queries per day. These queries are often multi-key look-ups in highly skewed tables and indexes. In our production experience, only progressive query-specific partitioning can achieve Napa's strict query latency SLOs. Here we advocate good-enough partitioning that keeps the per-query partitioning time low without risking uneven work distribution. Our design combines pragmatic system choices and algorithmic innovations. For instance, B-trees are augmented with statistics of key distributions, thus serving the dual purpose of aiding lookups and partitioning. Furthermore, progressive partitioning is designed to be "good enough" thereby balancing partitioning time with performance. The resulting system is robust and successfully serves day-in-day-out billions of queries with very high quality of service forming a core infrastructure at Google.

Publisher

Association for Computing Machinery (ACM)

Subject

General Earth and Planetary Sciences,Water Science and Technology,Geography, Planning and Development

Link

https://dl.acm.org/doi/pdf/10.14778/3611540.3611541

Reference25 articles.

1. S. Agrawal V. R. Narasayya and B. Yang. 2004. Integrating Vertical and Horizontal Partitioning Into Automated Physical Database Design. In SIGMOD (Paris France). 359--370. S. Agrawal V. R. Narasayya and B. Yang. 2004. Integrating Vertical and Horizontal Partitioning Into Automated Physical Database Design. In SIGMOD (Paris France). 359--370.

2. A. Ailamaki D. J. DeWitt M. D. Hill and M. Skounakis. 2001. Weaving Relations for Cache Performance. In VLDB (Rome Italy). 169--180. A. Ailamaki D. J. DeWitt M. D. Hill and M. Skounakis. 2001. Weaving Relations for Cache Performance. In VLDB (Rome Italy). 169--180.

3. Parallel algorithms for the execution of relational database operations

4. H. Boral and D. J. DeWitt. 1980. Design Considerations for Data-flow Database Machines. In SIGMOD (Santa Monica CA). 94--104. H. Boral and D. J. DeWitt. 1980. Design Considerations for Data-flow Database Machines. In SIGMOD (Santa Monica CA). 94--104.

5. Spanner: Google's Globally Distributed Database;J. C.;ACM Trans. Comput. Syst.,2013