Affiliation:
1. University of California, Merced, Merced, CA
2. University of Florida, Gainesville, FL
Abstract
In this paper we introduce GLADE, a scalable distributed framework for large scale data analytics. GLADE consists of a simple user-interface to define Generalized Linear Aggregates (GLA), the fundamental abstraction at the core of GLADE, and a distributed runtime environment that executes GLAs by using parallelism extensively.
GLAs are derived from User-Defined Aggregates (UDA), a relational database extension that allows the user to add specialized aggregates to be executed inside the query processor. GLAs extend the UDA interface with methods to Serialize/Deserialize the state of the aggregate required for distributed computation. As a significant departure from UDAs which can be invoked only through SQL, GLAs give the user direct access to the state of the aggregate, thus allowing for the computation of significantly more complex aggregate functions.
GLADE runtime is an execution engine optimized for the GLA computation. The runtime takes the user-defined GLA code, compiles it inside the engine, and executes it right near the data by taking advantage of parallelism both inside a single machine as well as across a cluster of computers. This results in maximum possible execution time performance (all our experimental tasks are I/O-bound) and linear scaleup.
Publisher
Association for Computing Machinery (ACM)
Reference17 articles.
1. Hadoop. http://hadoop.apache.org/. {Online; accessed July 2011}. Hadoop. http://hadoop.apache.org/. {Online; accessed July 2011}.
2. Microsoft SQL Server. http://msdn.microsoft.com/enus/library/ms131057.aspx. {Online; accessed July 2011}. Microsoft SQL Server. http://msdn.microsoft.com/enus/library/ms131057.aspx. {Online; accessed July 2011}.
3. The DataPath system
4. MAD skills
Cited by
14 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献