Time-Aware Data Partition Optimization and Heterogeneous Task Scheduling Strategies in Spark Clusters

Author:

Lu SenXing12,Zhao Mingming34,Li Chunlin564,Du Quanbing6,Luo Youlong42

Affiliation:

1. Key Laboratory of AI and Information Processing (Hechi University) , Education Department of Guangxi Zhuang Autonomous Region, Hechi 546300, P.R . China

2. Fujian Key Laboratory of Island Monitoring and Ecological Development (Island Research Center, MNR) , Fuzhou 363601, P.R . China

3. Yunnan Normal University Yunnan Provincial Rural Energy Engineering Key Laboratory, , Kunming 650500, P.R . China

4. Wuhan University of Technology Department of Computer Science, , Wuhan 430063, P.R . China

5. Wuzhou University Guangxi Key Laboratory of Machine Vision and Intelligent Control, , Wuzhou 543002, P.R . China

6. Henan of Mechanical and Electrical Vocational College Henan Key Laboratory of Intelligent Manufacturing Equipment Integration for Superhard Materials, , Zhengzhou 451192, P.R . China

Abstract

AbstractThe Spark computing framework provides an efficient solution to address the major requirements of big data processing, but data partitioning and job scheduling in the Spark framework are the two major bottlenecks that limit Spark’s performance. In the Spark Shuffle phase, the data skewing problem caused by unbalanced data partitioning leads to the problem of increased job completion time. In response to the above problems, a balanced partitioning strategy for intermediate data is proposed in this article, which considers the characteristics of intermediate data, establishes a data skewing model and proposes a dynamic partitioning algorithm. In Spark heterogeneous clusters, because of the differences in node performance and task requirements, the default task scheduling algorithm cannot complete scheduling efficiently, which leads to low system task processing efficiency. In order to deal with the above problems, an efficient job scheduling strategy is proposed in this article, which integrates node performance and task requirements, and proposes a task scheduling algorithm using greedy strategy. The experimental results prove that the dynamic partitioning algorithm for intermediate data proposed in this article effectively alleviates the problem that data skew leads to the decrease of system task processing efficiency and shortens the overall task completion time. The efficient job scheduling strategy proposed in this article can efficiently complete the job scheduling tasks under heterogeneous clusters, allocate jobs to nodes in a balanced manner, decrease the overall job completion time and increase the system resource utilization.

Funder

Open Fund of Fujian Key Laboratory of Island Monitoring and Ecological Development

Open Fund of Yunnan Provincial Rural Energy Engineering Key Laboratory Yunnan Normal University

Open Fund of Henan Key Laboratory of Intelligent Manufacturing Equipment Integration for Superhard Materials

Guangxi Key Laboratory of Machine Vision and Intelligent Control

Open Fund of Key Laboratory of AI and Information Processing (Hechi University), Education Department of Guangxi Zhuang Autonomous Region

National Natural Science Foundation of China

Publisher

Oxford University Press (OUP)

Subject

General Computer Science

Reference40 articles.

1. Energy-latency tradeoffs for edge caching and dynamic service migration based on DQN in mobile edge computing;Li;J. Parallel Distrib. Comput.,2022

2. Data-intensive applications, challenges, techniques and technologies: a survey on big data;Chen;Inform. Sci.,2014

3. A survey on Spark ecosystem: big data processing infrastructure, machine learning, and applications;Tang;IEEE Trans. Knowl. Data Eng.,2022

4. Blockchain-based data trading in edge-cloud computing environment;Li;Inf. Process. Manag.,2022

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3