Affiliation:
1. Twitter, Inc. San Francisco, California
Abstract
Summingbird is an open-source domain-specific language implemented in Scala and designed to integrate online and batch MapReduce computations in a single framework. Summingbird programs are written using dataflow abstractions such as sources, sinks, and stores, and can run on different execution platforms: Hadoop for batch processing (via Scalding/Cascading) and Storm for online processing. Different execution modes require different bindings for the dataflow abstractions (e.g., HDFS files or message queues for the source) but do not require any changes to the program logic. Furthermore, Summingbird can operate in a hybrid processing mode that transparently integrates batch and online results to efficiently generate up-to-date aggregations over long time spans. The language was designed to improve developer productivity and address pain points in building analytics solutions at Twitter where often, the same code needs to be written twice (once for batch processing and again for online processing) and indefinitely maintained in parallel. Our key insight is that certain algebraic structures provide the theoretical foundation for integrating batch and online processing in a seamless fashion. This means that Summingbird imposes constraints on the types of aggregations that can be performed, although in practice we have not found these constraints to be overly restrictive for a broad range of analytics tasks at Twitter.
Subject
General Earth and Planetary Sciences,Water Science and Technology,Geography, Planning and Development
Cited by
76 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Die NoSQL-Toolbox: Die NoSQL-Landschaft im Überblick;Schnelles und skalierbares Cloud-Datenmanagement;2024
2. Practical Storage-Compute Elasticity for Stream Data Processing;Proceedings of the 24th International Middleware Conference: Industrial Track;2023-12-11
3. Pravega;Proceedings of the 24th International Middleware Conference on ZZZ;2023-11-27
4. A Unified Stream and Batch Graph Computing Model for Community Detection;Computer Supported Cooperative Work and Social Computing;2023
5. A dynamic feature selection and intelligent model serving for hybrid batch-stream processing;Knowledge-Based Systems;2022-11