JAMPI: Efficient Matrix Multiplication in Spark Using Barrier Execution Mode-Reference-Cited by-同舟云学术

JAMPI: Efficient Matrix Multiplication in Spark Using Barrier Execution Mode

Published:2020-11-05 Issue:4 Volume:4 Page:32
ISSN:2504-2289
Container-title:Big Data and Cognitive Computing
language:en
Short-container-title:BDCC

Author:

Foldi Tamas^ORCID,von Csefalvay Chris^ORCID,Perez Nicolas A.^ORCID

Abstract

The new barrier mode in Apache Spark allows for embedding distributed deep learning training as a Spark stage to simplify the distributed training workflow. In Spark, a task in a stage does not depend on any other tasks in the same stage, and hence it can be scheduled independently. However, several algorithms require more sophisticated inter-task communications, similar to the MPI paradigm. By combining distributed message passing (using asynchronous network IO), OpenJDK’s new auto-vectorization and Spark’s barrier execution mode, we can add non-map/reduce-based algorithms, such as Cannon’s distributed matrix multiplication to Spark. We document an efficient distributed matrix multiplication using Cannon’s algorithm, which significantly improves on the performance of the existing MLlib implementation. Used within a barrier task, the algorithm described herein results in an up to 24% performance increase on a 10,000 × 10,000 square matrix with a significantly lower memory footprint. Applications of efficient matrix multiplication include, among others, accelerating the training and implementation of deep convolutional neural network-based workloads, and thus such efficient algorithms can play a ground-breaking role in the faster and more efficient execution of even the most complicated machine learning tasks.

Publisher

MDPI AG

Subject

Artificial Intelligence,Computer Science Applications,Information Systems,Management Information Systems

Link

https://www.mdpi.com/2504-2289/4/4/32/pdf

Reference35 articles.

1. Deep learning for visual understanding: A review

2. Deep Learning for Computer Vision: A Brief Review

3. A Deep Learning Network Approach to ab initio Protein Secondary Structure Prediction

4. Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning

5. A deep learning framework for modeling structural features of RNA-binding protein targets

Cited by 7 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Age-of-Event Aware: Sampling Period Optimization in a Three-Stage Wireless Cyber-Physical System With Diverse Parallelisms;IEEE Transactions on Parallel and Distributed Systems;2024-08

2. Lightweight Computational Complexity Stepping Up the NTRU Post-Quantum Algorithm Using Parallel Computing;Symmetry;2023-12-21

3. Stepping up the NTRU-Post Quantum Algorithm Using Parallel Computing;2023-05-10

4. The Tiny-Tasks Granularity Trade-Off: Balancing Overhead Versus Performance in Parallel Systems;IEEE Transactions on Parallel and Distributed Systems;2023-04-01

5. An Approach for Matrix Multiplication of 32-Bit Fixed Point Numbers by Means of 16-Bit SIMD Instructions on DSP;Electronics;2022-12-25