M‐DFCPP: A runtime library for multi‐machine dataflow computing-Reference-Cited by-同舟云学术

M‐DFCPP: A runtime library for multi‐machine dataflow computing

Published:2024-08-07 Issue: Volume: Page:
ISSN:1532-0626
Container-title:Concurrency and Computation: Practice and Experience
language:en
Short-container-title:Concurrency and Computation

Author:

Luo Qiuming¹²³,Liu Senhong¹^ORCID,Huang Jinke¹,Li Jinrong¹

Affiliation:

1. College of Computer Science and Software Engineering Shenzhen University Shenzhen China

2. SKT Group Guangdong Province Key Laboratory of Popular High‐Performance Computers Shenzhen China

3. SKT Group Guangdong Province Engineering Center of China‐made High Performance Data Computing System Shenzhen China

Abstract

SummaryThis article designs and implements a runtime library for general dataflow programming, DFCPP (Luo Q, Huang J, Li J, Du Z. Proceedings of the 52nd International Conference on Parallel Processing Workshops. ACM; 2023:145‐152.), and builds upon it to design and implement a multi‐machine C++ dataflow library, M‐DFCPP. In comparison to existing dataflow programming environments, DFCPP features a user‐friendly interface and richer expressive capabilities (Luo Q, Huang J, Li J, Du Z. Proceedings of the 52nd International Conference on Parallel Processing Workshops. ACM; 2023:145‐152.), enabling the representation of various types of dataflow actor tasks (static, dynamic and conditional task). Besides that, DFCPP addresses the memory management and task scheduling for non‐uniform memory access architectures, while other dataflow libraries lack attention to these issues. M‐DFCPP extends the capability of current dataflow runtime libraries (DFCPP, taskflow, openstream, etc.) and capable of multi‐machine computing, while maintains the API compatible with DFCPP. M‐DFCPP adopts the concepts of master and follower (Dean J, Ghemawat S. Commun ACM. 2008;51(1):107‐113; Ghemawat S, Gobioff H, Leung ST. ACM SIGOPS Operating Systems Review. ACM; 2003:29‐43.), which form a worksharing framework as many multi‐machine system. To shift to the M‐DFCPP framework, a filtering layer is inserted to the original DFCPP, transforming it into followers that can cooperate with each other. The master is made of modules for scheduling, data processing, graph partition, state management and so forth. In benchmark tests with workload with directed acyclic graph topology of binary trees and random graphs, DFCPP demonstrated performance improvements of 20% and 8%, respectively, compared to the second fastest library. M‐DFCPP consistently exhibits outstanding performance across varying levels of concurrency and task workloads, achieving a maximum speedup of more than 20 over DFCPP, when the task parallelism exceeds 5000 on 32 nodes. Moreover, M‐DFCPP, as a runtime library supporting multi‐node dataflow computation, is compared with MPI, a runtime library supporting multi‐node control flow computation.

Funder

Science and Technology Foundation of Shenzhen City

National Key Research and Development Program of China

Publisher

Wiley

Link

https://onlinelibrary.wiley.com/doi/pdf/10.1002/cpe.8248

Reference27 articles.

1. OpenMP: an industry standard API for shared-memory programming

2. X10