Abstract
With the rapid growth of Internet data, the performance of big data processing platforms is attracting more and more attention. In Spark, cache data are replaced by the Least Recently Used (LRU) Algorithm. LRU cannot identify the cost of cache data, which leads to replacing some important cache data. In addition, the placement of cache data is random, which lacks a measure to find efficient cache servers. Focusing on the above problems, a remote cache management framework (RCM) for the Spark platform was proposed, including a cache weight generation module (CWG), cache replacement module (CREP), and cache placement module (CPL). CWG establishes initial weights from three main factors: the response time of the query database, the number of queries, and the data size. Then, CWG reduces the old data weight through a time loss function. CREP promises that the sum of cache data weights is maximized by a greedy strategy. CPL allocates the best cache server for data based on the Kuhn-Munkres matching algorithm to improve cooperation efficiency. To verify the effectiveness of RCM, RCM is implemented on Redis and deployed on eight computing nodes and four cache servers. Three groups of benchmark jobs, PageRank, K-means and WordCount, is tested. The result of experiments confirmed that compared with MCM, SACM and DMAOM, the execution time of RCM is reduced by 42.1% at most.
Funder
Henan Province Science and Technology R\&D Project
Subject
Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science
Reference37 articles.
1. A comprehensive performance analysis of Apache Hadoop and Apache Spark for large scale data sets using HiBench;J. Big Data,2020
2. MEMTUNE: Dynamic memory management for in-memory data analytic platforms;Proc. IEEE Int. Parallel Distrib. Process. Symp.,2016
3. The Time Machine in Columnar NoSQL Databases: The Case of Apache HBase;Future Internet,2022
4. HPCache: Memory-Efficient OLAP Through Proportional Caching. In Data Management on New Hardware;Assoc. Comput. Mach.,2022
5. Redis and Amazon’s MemoryDB;Database Trends Appl.,2021
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献