Affiliation:
1. Department of Computer Science University of Oregon Eugene Oregon USA
2. Department of Computer Science University of Helsinki Helsinki Finland
3. NVIDIA Corporation Santa Clara California USA
4. Institute of Astronomy and Astrophysics Academia Sinica Taipei Taiwan
5. Department of Computer Science Aalto University Espoo Finland
Abstract
SummaryWith the rise of exascale systems and large, data‐centric workflows, the need to observe and analyze high performance computing (HPC) applications during their execution is becoming increasingly important. HPC applications are typically not designed with online monitoring in mind, therefore, the observability challenge lies in being able to access and analyze interesting events with low overhead while seamlessly integrating such capabilities into existing and new applications. We explore how our service‐based observation, monitoring, and analytics (SOMA) approach to collecting and aggregating both application‐specific diagnostic data and performance data addresses these needs. We present our SOMA framework and demonstrate its viability with LULESH, a hydrodynamics proxy application. Then we focus on Astaroth, a multi‐GPU library for stencil computations, highlighting the integration of the TAU and APEX performance tools and SOMA for application and performance data monitoring.
Funder
European Research Council
U.S. Department of Energy
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Enabling Performance Observability for Heterogeneous HPC Workflows with SOMA;Proceedings of the 53rd International Conference on Parallel Processing;2024-08-12
2. Self Adjusting Log Observability for Cloud Native Applications;2024 IEEE 17th International Conference on Cloud Computing (CLOUD);2024-07-07