Affiliation:
1. University of Illinois at ChicagoChicago, IL 60607, USA
2. Open Data GroupSuite 90, 400 Lathrop Avenue, River Forest, IL 60305, USA
Abstract
Cloud computing has demonstrated that processing very large datasets over commodity clusters can be done simply, given the right programming model and infrastructure. In this paper, we describe the design and implementation of the Sector storage cloud and the Sphere compute cloud. By contrast with the existing storage and compute clouds, Sector can manage data not only within a data centre, but also across geographically distributed data centres. Similarly, the Sphere compute cloud supports user-defined functions (UDFs) over data both within and across data centres. As a special case, MapReduce-style programming can be implemented in Sphere by using a Map UDF followed by a Reduce UDF. We describe some experimental studies comparing Sector/Sphere and Hadoop using the Terasort benchmark. In these studies, Sector is approximately twice as fast as Hadoop. Sector/Sphere is open source.
Subject
General Physics and Astronomy,General Engineering,General Mathematics
Reference19 articles.
1. Babcock B. Babu S. Datar M. Motwani R. & Widom J. 2002 Models and issues in data stream systems. In Proc. 21st ACM SIGMOD-SIGACT-SIGART Symp. on Principles of Database Systems PODS 2002 New York pp. 1–16.
2. Beynon M. D. Ferreira R. Kurc T. Sussman A. & Saltz J. 2000 DataCutter: middleware for filtering very large scientific datasets on archival storage systems. In Mass Storage Systems Conf. College Park MD March 2000 .
3. Borthaku D. 2007 The Hadoop distributed file system: architecture and design. See lucene.apache.org/hadoop.
4. Chang F. Dean J. Ghemawat S. Hsieh W. C. Wallach D. A. Burrows M. Chandra T. Fikes A. & Gruber R. E. 2006 BigTable: a distributed storage system for structured data. In OSDI'06 Seattle WA November 2006 .
5. Chen L. Reddy K. & Agrawal G. 2004 GATES: a grid-based middleware for processing distributed data streams. In 13th IEEE Int. Symp. on High Performance Distributed Computing (HPDC) 2004 Honolulu HI .
Cited by
59 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献