Collaborative Learning Based Straggler Prevention in Large-Scale Distributed Computing Framework-Reference-Cited by-同舟云学术

Collaborative Learning Based Straggler Prevention in Large-Scale Distributed Computing Framework

Published:2021-05-23 Issue: Volume:2021 Page:1-9
ISSN:1939-0122
Container-title:Security and Communication Networks
language:en
Short-container-title:Security and Communication Networks

Author:

Deshmukh Shyam¹^ORCID,Thirupathi Rao Komati¹^ORCID,Shabaz Mohammad²^ORCID

Affiliation:

1. Department of Computer Science and Engineering, Koneru Lakshmaiah Education Foundation, Vaddeswaram, Guntur 522502, AP, India

2. Arba Minch University, Arba Minch, Ethiopia

Abstract

Modern big data applications tend to prefer a cluster computing approach as they are linked to the distributed computing framework that serves users jobs as per demand. It performs rapid processing of tasks by subdividing them into tasks that execute in parallel. Because of the complex environment, hardware and software issues, tasks might run slowly leading to delayed job completion, and such phenomena are also known as stragglers. The performance improvement of distributed computing framework is a bottleneck by straggling nodes due to various factors like shared resources, heavy system load, or hardware issues leading to the prolonged job execution time. Many state-of-the-art approaches use independent models per node and workload. With increased nodes and workloads, the number of models would increase, and even with large numbers of nodes. Not every node would be able to capture the stragglers as there might not be sufficient training data available of straggler patterns, yielding suboptimal straggler prediction. To alleviate such problems, we propose a novel collaborative learning-based approach for straggler prediction, the alternate direction method of multipliers (ADMM), which is resource-efficient and learns how to efficiently deal with mitigating stragglers without moving data to a centralized location. The proposed framework shares information among the various models, allowing us to use larger training data and bring training time down by avoiding data transfer. We rigorously evaluate the proposed method on various datasets with high accuracy results.

Publisher

Hindawi Limited

Subject

Computer Networks and Communications,Information Systems

Link

http://downloads.hindawi.com/journals/scn/2021/8340925.pdf

Reference43 articles.

1. MapReduce

2. Improving mapreduce performance using smart speculative execution strategy;Q. Chen;Institute of Electrical and Electronics Engineers Transactions on Computers,2013

3. Wrangler: Predictable and faster jobs using fewer resources.;N. J. Yadwadkar

4. LADRA: Log-based abnormal task detection and root-cause analysis in big data processing with Spark

Cited by 51 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. A novel deep learning framework based swin transformer for dermal cancer cell classification;Engineering Applications of Artificial Intelligence;2024-07

2. Harnessing K-means Clustering to Decode Communication Patterns in Modern Electronic Devices;Journal of Machine and Computing;2024-01-05

3. Intrusion Detection in Internet of Things Systems: A Feature Extraction with Naive Bayes Classifier Approach;Journal of Machine and Computing;2024-01-05

4. DPro-SM – A distributed framework for proactive straggler mitigation using LSTM;Heliyon;2024-01

5. Precision Cardiac Risk Evaluation through Flexible Classification Mining Strategies;2023 IEEE International Conference on ICT in Business Industry & Government (ICTBIG);2023-12-08