A Model and Survey of Distributed Data-Intensive Systems

Author:

Margara Alessandro1ORCID,Cugola Gianpaolo1ORCID,Felicioni Nicolò1ORCID,Cilloni Stefano1ORCID

Affiliation:

1. Politecnico di Milano, Italy

Abstract

Data is a precious resource in today’s society, and it is generated at an unprecedented and constantly growing pace. The need to store, analyze, and make data promptly available to a multitude of users introduces formidable challenges in modern software platforms. These challenges radically impacted the research fields that gravitate around data management and processing, with the introduction of distributed data-intensive systems that offer innovative programming models and implementation strategies to handle data characteristics such as its volume, the rate at which it is produced, its heterogeneity, and its distribution. Each data-intensive system brings its specific choices in terms of data model, usage assumptions, synchronization, processing strategy, deployment, guarantees in terms of consistency, fault tolerance, and ordering. Yet, the problems data-intensive systems face and the solutions they propose are frequently overlapping. This article proposes a unifying model that dissects the core functionalities of data-intensive systems, and discusses alternative design and implementation strategies, pointing out their assumptions and implications. The model offers a common ground to understand and compare highly heterogeneous solutions, with the potential of fostering cross-fertilization across research communities. We apply our model by classifying tens of systems: an exercise that brings to interesting observations on the current trends in the domain of data-intensive systems and suggests open research directions.

Publisher

Association for Computing Machinery (ACM)

Subject

General Computer Science,Theoretical Computer Science

Reference102 articles.

1. M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, et al. 2016. TensorFlow: A system for large-scale machine learning. In Proceedings of OSDI 2016.

2. HadoopDB: An architectural hybrid of MapReduce and DBMS technologies for analytical workloads;Abouzeid A.;Proceedings of the VLDB Endowment,2009

3. Monarch: Google’s planet-scale in-memory time series database;Adams C.;Proceedings of the VLDB Endowment,2020

4. A. Adya, B. Liskov, and P. O’Neil. 2000. Generalized isolation level definitions. In Proceedings of ICDE 2000. IEEE, Los Alamitos, CA.

5. TSpoon: Transactions on a stream processor;Affetti L.;Journal of Parallel and Distributed Computing,2020

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3