Affiliation:
1. Oak Ridge National Laboratory Oak Ridge Tennessee USA
2. Los Alamos National Laboratory Los Alamos New Mexico USA
Abstract
SummaryThe Cray HPE Slingshot 11 network is used on the new exascale systems arriving at the U.S. Department of Energy (DoE) laboratories (e.g., Frontier, Aurora, Perlmutter). As such, the support of this network is an important capability to meet the needs of exascale applications. This article highlights recent work to develop supporting infrastructure to enable Open MPI to efficiently support these new platforms. A key component of this effort involves development of a new Open Fabrics Interface (OFI) provider, LinkX. We discuss the design and development of enhancements that take advantage of the new Slingshot 11 network and AMD GPUs. We include performance data from tests on the Frontier supercomputer using synthetic communication benchmarks, and the vendor provided MPI as a baseline for comparison. The tests demonstrate full functionality of Open MPI on the system and initial results show favorable performance when compared to the highly tuned vendor implementation.
Funder
U.S. Department of Energy
National Nuclear Security Administration
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Taking the MPI standard and the open MPI library to exascale;The International Journal of High Performance Computing Applications;2024-07-23