Author:
Abuelsoud M M,Kogutenko A A,Naveen
Abstract
Abstract
Efficiently processing vast and expanding data volumes is a pressing challenge. Traditional high-performance computers, utilizing distributed-memory architecture and a message-passing model, grapple with synchronization issues, hampering their ability to keep up with the growing demands. Remote Memory Access (RMA), often referred to as one-sided MPI communications, offers a solution by allowing a process to directly access another process’s memory, eliminating the need for message exchange and significantly boosting performance. Unfortunately, the existing MPI RMA standard lacks a collective operation interface, limiting efficiency. To overcome this constraint, we introduce an algorithm design that enables efficient parallelizable collective operations within the RMA framework. Our study focuses primarily on the advantages of collective operations, using the broadcast algorithm as a case study. Our implementations surpass traditional methods, highlighting the promising potential of this technique, as indicated by initial performance tests.
Reference26 articles.
1. MPI + MPI: A new hybrid approach to parallel programming with MPI plus shared memory;Hoefler;Computing,2013
2. Fast collective operations using shared and remote memory access protocols on clusters Proc;Tipparaju,2003
3. High-performance RMA-based broadcast on the Intel SCC;Petrovi c,2012
4. Exploiting non- blocking remote memory access communication in scientific benchmarks;Tipparaju;Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics),2003
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献