Check Out the Big Brain on BRAD: Simplifying Cloud Data Processing with Learned Automated Data Meshes

Author:

Kraska Tim1,Li Tianyu2,Madden Samuel2,Markakis Markos2,Ngom Amadou2,Wu Ziniu2,Yu Geoffrey X.2

Affiliation:

1. MIT CSAIL and Amazon Web Services

2. MIT CSAIL

Abstract

The last decade of database research has led to the prevalence of specialized systems for different workloads. Consequently, organizations often rely on a combination of specialized systems, organized in a Data Mesh. Data meshes present significant challenges for system administrators, including picking the right system for each workload, moving data between systems, maintaining consistency, and correctly configuring each system. Many non-expert end users (e.g., data analysts or app developers) either cannot solve their business problems, or suffer from sub-optimal performance or cost due to this complexity. We envision BRAD, a cloud system that automatically integrates and manages data and systems into an instance-optimized data mesh, allowing users to efficiently store and query data under a unified data model (i.e., relational tables) without knowledge of underlying system details. With machine learning, BRAD automatically deduces the strengths and weaknesses of each engine through a combination of offline training and online probing. Then, BRAD uses these insights to route queries to the most suitable (combination of) system(s) for efficient execution. Furthermore, BRAD automates configuration tuning, resource scaling, and data migration across component systems, and makes recommendations for more impactful decisions, such as adding or removing systems. As such, BRAD exemplifies a new class of systems that utilize machine learning and the cloud to make complex data processing more accessible to end users, raising numerous new problems in database systems, machine learning, and the cloud.

Publisher

Association for Computing Machinery (ACM)

Subject

General Earth and Planetary Sciences,Water Science and Technology,Geography, Planning and Development

Reference87 articles.

1. Proteus: Autonomous Adaptive Storage for Mixed Workloads

2. RHEEM: enabling cross-platform data processing

3. Towards Scalable Hybrid Stores

4. Amazon Web Services. 2022. AWS announces Amazon Aurora zero-ETL integration with Amazon Redshift . https://aws.amazon.com/about-aws/whats-new/2022/11/amazon-aurora-zero-etl-integration-redshift/. Amazon Web Services. 2022. AWS announces Amazon Aurora zero-ETL integration with Amazon Redshift . https://aws.amazon.com/about-aws/whats-new/2022/11/amazon-aurora-zero-etl-integration-redshift/.

5. Amazon Web Services. 2023. Amazon Athena. https://aws.amazon.com/athena/. Amazon Web Services. 2023. Amazon Athena. https://aws.amazon.com/athena/.

Cited by 6 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. On Building an End-To-End Prototype System for Harvesting Performance Characteristics of Code Snippets;International Conference on Information Systems Development;2024-09-09

2. Data Mesh: A Systematic Gray Literature Review;ACM Computing Surveys;2024-08-07

3. Blueprinting the Cloud: Unifying and Automatically Optimizing Cloud Data Infrastructures with BRAD;Proceedings of the VLDB Endowment;2024-07

4. Mallet: SQL Dialect Translation with LLM Rule Generation;Proceedings of the Seventh International Workshop on Exploiting Artificial Intelligence Techniques for Data Management;2024-06-09

5. Hybrid Data Management Architecture for Present Quantum Computing;Lecture Notes in Computer Science;2024

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3