Affiliation:
1. University of Oxford
2. University of Edinburgh
Abstract
While large-scale distributed data processing platforms have become an attractive target for query processing, these systems are problematic for applications that deal with nested collections. Programmers are forced either to perform non-trivial translations of collection programs or to employ automated flattening procedures, both of which lead to performance problems. These challenges only worsen for nested collections with skewed cardinalities, where both handcrafted rewriting and automated flattening are unable to enforce load balancing across partitions.
In this work, we propose a framework that translates a program manipulating nested collections into a set of semantically equivalent shredded queries that can be efficiently evaluated. The framework employs a combination of query compilation techniques, an efficient data representation for nested collections, and automated skew-handling. We provide an extensive experimental evaluation, demonstrating significant improvements provided by the framework in diverse scenarios for nested collection programs.
Subject
General Earth and Planetary Sciences,Water Science and Technology,Geography, Planning and Development
Cited by
5 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Comprehending queries over finite maps;International Symposium on Principles and Practice of Declarative Programming;2023-10-22
2. Incremental Processing of Structured Data in Datalog;Proceedings of the 21st ACM SIGPLAN International Conference on Generative Programming: Concepts and Experiences;2022-11-29
3. Functional collection programming with semi-ring dictionaries;Proceedings of the ACM on Programming Languages;2022-04-29
4. Scalable analysis of multi-modal biomedical data;GigaScience;2021-09
5. The Power of Nested Parallelism in Big Data Processing Hitting Three Flies with One Slap ;Proceedings of the 2021 International Conference on Management of Data;2021-06-09