The family of mapreduce and large-scale data processing systems-Reference-Cited by-同舟云学术

The family of mapreduce and large-scale data processing systems

Published:2013-10 Issue:1 Volume:46 Page:1-44
ISSN:0360-0300
Container-title:ACM Computing Surveys
language:en
Short-container-title:ACM Comput. Surv.

Author:

Sakr Sherif¹,Liu Anna¹,Fayoumi Ayman G.²

Affiliation:

1. NICTA and University of New South Wales, Sydney, Australia

2. King Abdulaziz University, Saudia Arabia

Abstract

In the last two decades, the continuous increase of computational power has produced an overwhelming flow of data which has called for a paradigm shift in the computing architecture and large-scale data processing mechanisms. MapReduce is a simple and powerful programming model that enables easy development of scalable parallel applications to process vast amounts of data on large clusters of commodity machines. It isolates the application from the details of running a distributed program such as issues on data distribution, scheduling, and fault tolerance. However, the original implementation of the MapReduce framework had some limitations that have been tackled by many research efforts in several followup works after its introduction. This article provides a comprehensive survey for a family of approaches and mechanisms of large-scale data processing mechanisms that have been implemented based on the original idea of the MapReduce framework and are currently gaining a lot of momentum in both research and industrial communities. We also cover a set of introduced systems that have been implemented to provide declarative programming interfaces on top of the MapReduce framework. In addition, we review several large-scale data processing systems that resemble some of the ideas of the MapReduce framework for different purposes and application scenarios. Finally, we discuss some of the future research directions for implementing the next generation of MapReduce-like solutions.

Funder

Australian Government

ICT Centre of Excellence program

Australian Research Council

Publisher

Association for Computing Machinery (ACM)

Subject

General Computer Science,Theoretical Computer Science

Link

https://dl.acm.org/doi/pdf/10.1145/2522968.2522979

Reference134 articles.

1. SW-Store: a vertically partitioned DBMS for Semantic Web data management

2. HadoopDB

3. HadoopDB in action

4. Optimizing joins in a map-reduce environment

5. Fuzzy Joins Using MapReduce

Cited by 113 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Efficiency Assessment of MapReduce Algorithm on a Serverless Platform;2023 IEEE 3rd International Conference on Electronic Technology, Communication and Information (ICETCI);2023-05-26

2. Streaming State Validation Technique for Textual Big Data Using Apache Flink;Computational Linguistics and Intelligent Text Processing;2023

3. Handling Iterations in Distributed Dataflow Systems;ACM Computing Surveys;2022-12-31

4. Toward a prediction approach based on deep learning in Big Data analytics;Neural Computing and Applications;2022-11-13

5. An Introduction to Big Data Analytics;Special Publications;2022-09-07