ApproxHadoop-Reference-Cited by-同舟云学术

ApproxHadoop

Published:2015-05-12 Issue:4 Volume:50 Page:383-397
ISSN:0362-1340
Container-title:ACM SIGPLAN Notices
language:en
Short-container-title:SIGPLAN Not.

Author:

Goiri Inigo¹,Bianchini Ricardo¹,Nagarakatte Santosh²,Nguyen Thu D.²

Affiliation:

1. Microsoft Research, Redmond, WA, USA

2. Rutgers University, New Brunswick, NJ, USA

Abstract

We propose and evaluate a framework for creating and running approximation-enabled MapReduce programs. Specifically, we propose approximation mechanisms that fit naturally into the MapReduce paradigm, including input data sampling, task dropping, and accepting and running a precise and a user-defined approximate version of the MapReduce code. We then show how to leverage statistical theories to compute error bounds for popular classes of MapReduce programs when approximating with input data sampling and/or task dropping. We implement the proposed mechanisms and error bound estimations in a prototype system called ApproxHadoop. Our evaluation uses MapReduce applications from different domains, including data analytics, scientific computing, video encoding, and machine learning. Our results show that ApproxHadoop can significantly reduce application execution time and/or energy consumption when the user is willing to tolerate small errors. For example, ApproxHadoop can reduce runtimes by up to 32x when the user can tolerate an error of 1% with 95% confidence. We conclude that our framework and system can make approximation easily accessible to many application domains using the MapReduce model.

Funder

NSF

Publisher

Association for Computing Machinery (ACM)

Subject

Computer Graphics and Computer-Aided Design,Software

Link

https://dl.acm.org/doi/pdf/10.1145/2775054.2694351

Reference47 articles.

1. Apache Hadoop. http://hadoop.apache.org. Apache Hadoop. http://hadoop.apache.org.

2. Apache Mahout. http://mahout.apache.org. Apache Mahout. http://mahout.apache.org.

3. Apache Nutch. http://nutch.apache.org. Apache Nutch. http://nutch.apache.org.

4. BlinkDB

Cited by 13 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Poster: (Re)-Configuration Framework for Mission-Critical Applications in Edge Environments;Proceedings of the Eighth ACM/IEEE Symposium on Edge Computing;2023-12-06

2. Poster: Processing of Latency- and Deadline-aware Big Data Approaches at the Edge;Proceedings of the Eighth ACM/IEEE Symposium on Edge Computing;2023-12-06

3. A Survey of FPGA Optimization Methods for Data Center Energy Efficiency;IEEE Transactions on Sustainable Computing;2023-07-01

4. Approximate High-Performance Computing: A Fast and Energy-Efficient Computing Paradigm in the Post-Moore Era;IT Professional;2023-03

5. Cloud Big Data Mining and Analytics: Bringing Greenness and Acceleration in the Cloud;Machine Learning for Data Science Handbook;2023