An innovative parameter optimization of Spark Streaming based on D3QN with Gaussian process regression-Reference-Cited by-同舟云学术

An innovative parameter optimization of Spark Streaming based on D3QN with Gaussian process regression

Published:2023 Issue:8 Volume:20 Page:14464-14486
ISSN:1551-0018
Container-title:Mathematical Biosciences and Engineering
language:
Short-container-title:MBE

Author:

Zhang Hong¹,Xu Zhenchao¹,Wang Yunxiang¹,Shen Yupeng²

Affiliation:

1. School of Cyber Security and Computer, Hebei University, Baoding, China

2. Bureau of Geophysical Prospecting, Baoding, China

Abstract

<abstract><p>Nowadays, Spark Streaming, a computing framework based on Spark, is widely used to process streaming data such as social media data, IoT sensor data or web logs. Due to the extensive utilization of streaming media data analysis, performance optimization for Spark Streaming has gradually developed into a popular research topic. Several methods for enhancing Spark Streaming's performance include task scheduling, resource allocation and data skew optimization, which primarily focus on how to manually tune the parameter configuration. However, it is indeed very challenging and inefficient to adjust more than 200 parameters by means of continuous debugging. In this paper, we propose an improved dueling double deep Q-network (DQN) technique for parameter tuning, which can significantly improve the performance of Spark Streaming. This approach fuses reinforcement learning and Gaussian process regression to cut down on the number of iterations and speed convergence dramatically. The experimental results demonstrate that the performance of the dueling double DQN method with Gaussian process regression can be enhanced by up to 30.24%.</p></abstract>

Publisher

American Institute of Mathematical Sciences (AIMS)

Subject

Applied Mathematics,Computational Mathematics,General Agricultural and Biological Sciences,Modeling and Simulation,General Medicine

Reference36 articles.

1. Apache storm. Available from: https://storm.apache.org/.

2. Apache spark streaming. Available from: https://spark.apache.org/docs/latest/streaming-programming-guide.html.

3. Apache flink. Available from: https://flink.apache.org/.

4. D. Cheng, X. Zhou, Y. Wang, C. Jiang, Adaptive scheduling parallel jobs with dynamic batching in spark streaming, IEEE Trans. Parallel Distrib. Syst., 29 (2018), 2672–2685. https://doi.org/10.1109/TPDS.2018.2846234

5. H. Du, P. Han, Q. Xiang, S. Huang, Monkeyking: Adaptive parameter tuning on big data platforms with deep reinforcement learning, Big Data, 8 (2020), 270–290.

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. A novel machine learning-based framework for channel bandwidth allocation and optimization in distributed computing environments;EURASIP Journal on Wireless Communications and Networking;2023-09-28