A TTL-based Approach for Data Aggregation in Geo-distributed Streaming Analytics

Author:

Kumar Dhruv1,Li Jian2,Chandra Abhishek1,Sitaraman Ramesh2

Affiliation:

1. University of Minnesota, Twin Cities, Minneapolis, MN, USA

2. University of Massachusetts, Amherst, Amherst, MA, USA

Abstract

Streaming analytics require real-time aggregation and processing of geographically distributed data streams continuously over time. The typical analytics infrastructure for processing such streams follow a hub-and-spoke model, comprising multiple edges connected to a center by a wide-area network (WAN). The aggregation of such streams often require that the results be available at the center within a certain acceptable delay bound. Further, the WAN bandwidth available between the edges and the center is often scarce or expensive, requiring that the traffic between the edges and the center be minimized. We propose a novel Time-to-Live (TTL-)based mechanism for real-time aggregation that provably optimizes both delay and traffic, providing a theoretical basis for understanding the delay-traffic tradeoff that is fundamental to streaming analytics. Our TTL-based optimization model provides analytical answers to how much aggregation should be performed at the edge versus the center, how much delay can be incurred at the edges, and how the edge-to-center bandwidth must be apportioned across applications with different delay requirements. To evaluate our approach, we implement our TTL-based aggregation mechanism in Apache Flink, a popular stream analytics framework. We deploy our Flink implementation in a hub-and-spoke architecture on geo-distributed Amazon EC2 data centers and a WAN-emulated local testbed, and run aggregation tasks for realistic workloads derived from extensive Akamai and Twitter traces. The delay-traffic tradeoff achieved by our Flink implementation agrees closely with theoretical predictions of our model. We show that by deriving the optimal TTLs using our model, our system can achieve a "sweet spot" where both delay and traffic are minimized, in comparison to traditional aggregation schemes such as batching and streaming.

Funder

Army Research Laboratory

National Science Foundation

U.K. Ministry of Defense

Publisher

Association for Computing Machinery (ACM)

Subject

Computer Networks and Communications,Hardware and Architecture,Safety, Risk, Reliability and Quality,Computer Science (miscellaneous)

Reference37 articles.

1. Akamai Download Analytics solution. Accessed: 2018--10--29. https://www.akamai.com/us/en/multimedia/documents/product-brief/download-analytics-product-brief.pdf. Akamai Download Analytics solution. Accessed: 2018--10--29. https://www.akamai.com/us/en/multimedia/documents/product-brief/download-analytics-product-brief.pdf.

2. Akamai Download Manager. Accessed: 2018--10--29. https://www.akamai.com/us/en/products/media-delivery/download-manager-overview.jsp. Akamai Download Manager. Accessed: 2018--10--29. https://www.akamai.com/us/en/products/media-delivery/download-manager-overview.jsp.

3. Akamai Media Analytics. Accessed: 2018--10--29. https://www.akamai.com/us/en/products/media-delivery/media-analytics.jsp. Akamai Media Analytics. Accessed: 2018--10--29. https://www.akamai.com/us/en/products/media-delivery/media-analytics.jsp.

4. Tyler Akidau Eric Schmidt Sam Whittle Robert Bradshaw Craig Chambers Slava Chernyak Rafael J. Fernández-Moctezuma Reuven Lax Sam McVeety Daniel Mills and Frances Perry. 2015. The dataflow model: a practical approach to balancing correctness latency and cost in massive-scale unbounded out-of-order data processing. (2015). Tyler Akidau Eric Schmidt Sam Whittle Robert Bradshaw Craig Chambers Slava Chernyak Rafael J. Fernández-Moctezuma Reuven Lax Sam McVeety Daniel Mills and Frances Perry. 2015. The dataflow model: a practical approach to balancing correctness latency and cost in massive-scale unbounded out-of-order data processing. (2015).

5. Memory-efficient groupby-aggregate using compressed buffer trees

Cited by 6 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. A survey on transactional stream processing;The VLDB Journal;2023-09-27

2. An overview of analysis methods and evaluation results for caching strategies;Computer Networks;2023-06

3. Multi-Stage Geo-Distributed Data Aggregation With Coordinated Computation and Communication in Edge Compute First Networking;Journal of Lightwave Technology;2023-04-15

4. Network-aware worker placement for wide-area streaming analytics;Future Generation Computer Systems;2022-11

5. DLion;Proceedings of the 30th International Symposium on High-Performance Parallel and Distributed Computing;2021-06-21

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3