Babelfish

Author:

Grulich Philipp Marian1,Zeuch Steffen1,Markl Volker1

Affiliation:

1. Technische Universitat Berlin

Abstract

Today's users of data processing systems come from different domains, have different levels of expertise, and prefer different programming languages. As a result, analytical workload requirements shifted from relational to polyglot queries involving user-defined functions (UDFs). Although some data processing systems support polyglot queries, they often embed third-party language runtimes. This embedding induces a high performance overhead, as it causes additional data materialization between execution engines. In this paper, we present Babelfish, a novel data processing engine designed for polyglot queries. Babelfish introduces an intermediate representation that unifies queries from different implementation languages. This enables new, holistic optimizations across operator and language boundaries, e.g., operator fusion and workload specialization. As a result, Babelfish avoids data transfers and enables efficient utilization of hardware resources. Our evaluation shows that Babelfish outperforms state-of-the-art data processing systems by up to one order of magnitude and reaches the performance of handwritten code. With Babelfish, we bridge the performance gap between relational and multi-language UDFs and lay the foundation for the efficient execution of future polyglot workloads.

Publisher

Association for Computing Machinery (ACM)

Subject

General Earth and Planetary Sciences,Water Science and Technology,Geography, Planning and Development

Reference127 articles.

1. 2016. Spark functions vs UDF performance? https://stackoverflow.com/questions/38296609/spark-functions-vs-udf-performance. 2016. Spark functions vs UDF performance? https://stackoverflow.com/questions/38296609/spark-functions-vs-udf-performance.

2. Stefan Ackermann , Vojin Jovanovic , Tiark Rompf , and Martin Odersky . 2012 . Jet: An embedded DSL for high performance big data processing. In BigData. Stefan Ackermann, Vojin Jovanovic, Tiark Rompf, and Martin Odersky. 2012. Jet: An embedded DSL for high performance big data processing. In BigData.

3. Douglas Adams . 1979. The Hitchhiker's Guide to the Galaxy . Pan Books . Douglas Adams. 1979. The Hitchhiker's Guide to the Galaxy. Pan Books.

4. Sameer Agarwal , Davies Liu , and Reynold Xin . 2016. Apache Spark as a Compiler: Joining a Billion Rows per Second on a Laptop. https://databricks.com/blog/2016/05/23/apache-spark-as-a-compiler-joining-a-billion-rows-per-second-on-a-laptop.html. [Online ; accessed 31.5. 2019 ]. Sameer Agarwal, Davies Liu, and Reynold Xin. 2016. Apache Spark as a Compiler: Joining a Billion Rows per Second on a Laptop. https://databricks.com/blog/2016/05/23/apache-spark-as-a-compiler-joining-a-billion-rows-per-second-on-a-laptop.html. [Online; accessed 31.5.2019].

5. RHEEM: enabling cross-platform data processing

Cited by 10 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Accelerating Multilingual Applications with In-memory Array Sharing;2023 IEEE International Conference on Big Data (BigData);2023-12-15

2. Showcasing Data Management Challenges for Future IoT Applications with NebulaStream;Proceedings of the VLDB Endowment;2023-08

3. Efficient Execution of User-Defined Functions in SQL Queries;Proceedings of the VLDB Endowment;2023-08

4. Big Data Analytics from the Rich Cloud to the Frugal Edge;2023 IEEE International Conference on Edge Computing and Communications (EDGE);2023-07

5. In-Situ Cross-Database Query Processing;2023 IEEE 39th International Conference on Data Engineering (ICDE);2023-04

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3