Affiliation:
1. Stanford University
2. MIT CSAIL
3. Imperial College London
Abstract
Modern analytics applications use a diverse mix of libraries and functions. Unfortunately, there is no optimization across these libraries, resulting in performance penalties as high as an order of magnitude in many applications. To address this problem, we proposed Weld, a
common runtime
for existing data analytics libraries that performs key physical optimizations such as pipelining under existing, imperative library APIs. In this work, we further develop the Weld vision by designing an automatic adaptive optimizer for Weld applications, and evaluating its impact on realistic data science workloads. Our optimizer eliminates multiple forms of overhead that arise when composing imperative libraries like Pandas and NumPy, and uses lightweight measurements to make data-dependent decisions at run-time in ad-hoc workloads where no statistics are available, with sub-second overhead. We also evaluate which optimizations have the largest impact in practice and whether Weld can be integrated into libraries incrementally. Our results are promising: using our optimizer, Weld accelerates data science workloads by up to 23X on one thread and 80X on eight threads, and its adaptive optimizations provide up to a 3.75X speedup over rule-based optimization. Moreover, Weld provides benefits if even just 4--5 operators in a library are ported to use it. Our results show that common runtime designs like Weld may be a viable approach to accelerate analytics.
Subject
General Earth and Planetary Sciences,Water Science and Technology,Geography, Planning and Development
Cited by
47 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Query Compilation Without Regrets;Proceedings of the ACM on Management of Data;2024-05-29
2. QFusor: A UDF Optimizer Plugin for SQL Databases;2024 IEEE 40th International Conference on Data Engineering (ICDE);2024-05-13
3. Sharing Queries with Nonequivalent User-defined Aggregate Functions;ACM Transactions on Database Systems;2024-04-10
4. BladeDISC: Optimizing Dynamic Shape Machine Learning Workloads via Compiler Approach;Proceedings of the ACM on Management of Data;2023-11-13
5. Efficient Execution of User-Defined Functions in SQL Queries;Proceedings of the VLDB Endowment;2023-08