Performance Evaluation Analysis of Spark Streaming Backpressure for Data-Intensive Pipelines-Reference-Cited by-同舟云学术

Performance Evaluation Analysis of Spark Streaming Backpressure for Data-Intensive Pipelines

Published:2022-06-23 Issue:13 Volume:22 Page:4756
ISSN:1424-8220
Container-title:Sensors
language:en
Short-container-title:Sensors

Author:

Matteussi Kassiano J.^ORCID,dos Anjos Julio C. S.^ORCID,Leithardt Valderi R. Q.^ORCID,Geyer Claudio F. R.^ORCID

Abstract

A significant rise in the adoption of streaming applications has changed the decision-making processes in the last decade. This movement has led to the emergence of several Big Data technologies for in-memory processing, such as the systems Apache Storm, Spark, Heron, Samza, Flink, and others. Spark Streaming, a widespread open-source implementation, processes data-intensive applications that often require large amounts of memory. However, Spark Unified Memory Manager cannot properly manage sudden or intensive data surges and their related in-memory caching needs, resulting in performance and throughput degradation, high latency, a large number of garbage collection operations, out-of-memory issues, and data loss. This work presents a comprehensive performance evaluation of Spark Streaming backpressure to investigate the hypothesis that it could support data-intensive pipelines under specific pressure requirements. The results reveal that backpressure is suitable only for small and medium pipelines for stateless and stateful applications. Furthermore, it points out the Spark Streaming limitations that lead to in-memory-based issues for data-intensive pipelines and stateful applications. In addition, the work indicates potential solutions.

Publisher

MDPI AG

Subject

Electrical and Electronic Engineering,Biochemistry,Instrumentation,Atomic and Molecular Physics, and Optics,Analytical Chemistry

Link

https://www.mdpi.com/1424-8220/22/13/4756/pdf

Reference34 articles.

1. Machine Learning and Big Data Analytics Paradigms: Analysis, Applications and Challenges;Hassanien,2020

2. ENERDGE: Distributed Energy-Aware Resource Allocation at the Edge

3. Dynamic memory-aware scheduling in spark computing environment

4. Latency-Aware Strategies for Deploying Data Stream Processing Applications on Large Cloud-Edge Infrastructure

5. Storm@twitter

Cited by 7 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. PAC: A monitoring framework for performance analysis of compression algorithms in Spark;Future Generation Computer Systems;2024-08

2. Towards a Decentralized Blockchain-Based Resource Monitoring Solution For Distributed Environments;Journal of Internet Services and Applications;2024-03-07

3. Adaptive Scheduling Framework of Streaming Applications based on Resource Demand Prediction with Hybrid Algorithms;Journal of Grid Computing;2024-03

4. A Novel Multi-Task Performance Prediction Model for Spark;Applied Sciences;2023-11-11

5. A Survey on Collaborative Learning for Intelligent Autonomous Systems;ACM Computing Surveys;2023-11-10