CCA: Cost-Capacity-Aware Caching for In-Memory Data Analytics Frameworks-Reference-Cited by-同舟云学术

CCA: Cost-Capacity-Aware Caching for In-Memory Data Analytics Frameworks

Published:2021-03-26 Issue:7 Volume:21 Page:2321
ISSN:1424-8220
Container-title:Sensors
language:en
Short-container-title:Sensors

Author:

Park Seongsoo,Jeong Minseop^ORCID,Han Hwansoo^ORCID

Abstract

To process data from IoTs and wearable devices, analysis tasks are often offloaded to the cloud. As the amount of sensing data ever increases, optimizing the data analytics frameworks is critical to the performance of processing sensed data. A key approach to speed up the performance of data analytics frameworks in the cloud is caching intermediate data, which is used repeatedly in iterative computations. Existing analytics engines implement caching with various approaches. Some use run-time mechanisms with dynamic profiling and others rely on programmers to decide data to cache. Even though caching discipline has been investigated long enough in computer system research, recent data analytics frameworks still leave a room to optimize. As sophisticated caching should consider complex execution contexts such as cache capacity, size of data to cache, victims to evict, etc., no general solution often exists for data analytics frameworks. In this paper, we propose an application-specific cost-capacity-aware caching scheme for in-memory data analytics frameworks. We use a cost model, built from multiple representative inputs, and an execution flow analysis, extracted from DAG schedule, to select primary candidates to cache among intermediate data. After the caching candidate is determined, the optimal caching is automatically selected during execution even if the programmers no longer manually determine the caching for the intermediate data. We implemented our scheme in Apache Spark and experimentally evaluated our scheme on HiBench benchmarks. Compared to the caching decisions in the original benchmarks, our scheme increases the performance by 27% on sufficient cache memory and by 11% on insufficient cache memory, respectively.

Funder

National Research Foundation of Korea

Publisher

MDPI AG

Subject

Electrical and Electronic Engineering,Biochemistry,Instrumentation,Atomic and Molecular Physics, and Optics,Analytical Chemistry

Link

https://www.mdpi.com/1424-8220/21/7/2321/pdf

Reference32 articles.

1. Collection and Processing of Data from Wrist Wearable Devices in Heterogeneous and Multiple-User Scenarios

2. Apache Spark https://spark.apache.org/

3. Apache Tez https://tez.apache.org/

4. Apache Storm http://storm.apache.org/

5. M3R

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Memory-Effective Parallel Mining of Incremental Frequent Itemsets Based on Multi-scale;Computer Supported Cooperative Work and Social Computing;2023