The Composable Data Management System Manifesto

Author:

Pedreira Pedro1,Erling Orri1,Karanasos Konstantinos1,Schneider Scott1,McKinney Wes2,Valluri Satya R3,Zait Mohamed3,Nadeau Jacques4

Affiliation:

1. Meta Platforms Inc.

2. Voltron Data

3. Databricks Inc.

4. Sundeck

Abstract

The requirement for specialization in data management systems has evolved faster than our software development practices. After decades of organic growth, this situation has created a siloed landscape composed of hundreds of products developed and maintained as monoliths, with limited reuse between systems. This fragmentation has resulted in developers often reinventing the wheel, increased maintenance costs, and slowed down innovation. It has also affected the end users, who are often required to learn the idiosyncrasies of dozens of incompatible SQL and non-SQL API dialects, and settle for systems with incomplete functionality and inconsistent semantics. In this vision paper, considering the recent popularity of open source projects aimed at standardizing different aspects of the data stack, we advocate for a paradigm shift in how data management systems are designed. We believe that by decomposing these into a modular stack of reusable components, development can be streamlined while creating a more consistent experience for users. Towards that goal, we describe the state-of-the-art, principal open source technologies, and highlight open questions and areas where additional research is needed. We hope this work will foster collaboration, motivate further research, and promote a more composable future for data management.

Publisher

Association for Computing Machinery (ACM)

Subject

General Earth and Planetary Sciences,Water Science and Technology,Geography, Planning and Development

Reference30 articles.

1. Apache Calcite

2. Photon: A Fast Query Engine for Lakehouse Systems

3. BlazingSQL. [n.d.]. A lightweight GPU accelerated SQL engine built on the RAPIDS.ai ecosystem. https://github.com/BlazingDB/blazingsql. BlazingSQL. [n.d.]. A lightweight GPU accelerated SQL engine built on the RAPIDS.ai ecosystem. https://github.com/BlazingDB/blazingsql.

4. Biswapesh Chattopadhyay , Pedro Pedreira , Sameer Agarwal , Yutian James Sun , Suketu Vakharia , Peng Li , Weiran Liu , and Sundaram Narayanan . 2023 . Shared Foundations: Modernizing Meta's Data Lakehouse . Conference on Innovative Data Systems Research (CIDR) (2023). Biswapesh Chattopadhyay, Pedro Pedreira, Sameer Agarwal, Yutian James Sun, Suketu Vakharia, Peng Li, Weiran Liu, and Sundaram Narayanan. 2023. Shared Foundations: Modernizing Meta's Data Lakehouse. Conference on Innovative Data Systems Research (CIDR) (2023).

5. David Chisnall . 2013. The Challenge of Cross-Language Interoperability. 56, 12 ( 2013 ). David Chisnall. 2013. The Challenge of Cross-Language Interoperability. 56, 12 (2013).

Cited by 6 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. What Goes Around Comes Around... And Around...;ACM SIGMOD Record;2024-07-30

2. GraphScope Flex: LEGO-like Graph Computing Stack;Companion of the 2024 International Conference on Management of Data;2024-06-09

3. Apache Arrow DataFusion: A Fast, Embeddable, Modular Analytic Query Engine;Companion of the 2024 International Conference on Management of Data;2024-06-09

4. Reproducible data science over data lakes: replayable data pipelines with Bauplan and Nessie;Proceedings of the Eighth Workshop on Data Management for End-to-End Machine Learning;2024-06-09

5. PilotScope: Steering Databases with Machine Learning Drivers;Proceedings of the VLDB Endowment;2024-01

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3