Performance characterization of containerization for HPC workloads on InfiniBand clusters: an empirical study-Reference-Cited by-同舟云学术

Performance characterization of containerization for HPC workloads on InfiniBand clusters: an empirical study

Published:2021-11-16 Issue: Volume: Page:
ISSN:1386-7857
Container-title:Cluster Computing
language:en
Short-container-title:Cluster Comput

Author:

Liu Peini^ORCID,Guitart Jordi

Abstract

AbstractContainerization technology offers an appealing alternative for encapsulating and operating applications (and all their dependencies) without being constrained by the performance penalties of using Virtual Machines and, as a result, has got the interest of the High-Performance Computing (HPC) community to obtain fast, customized, portable, flexible, and reproducible deployments of their workloads. Previous work on this area has demonstrated that containerized HPC applications can exploit InfiniBand networks, but has ignored the potential of multi-container deployments which partition the processes that belong to each application into multiple containers in each host. Partitioning HPC applications has demonstrated to be useful when using virtual machines by constraining them to a single NUMA (Non-Uniform Memory Access) domain. This paper conducts a systematical study on the performance of multi-container deployments with different network fabrics and protocols, focusing especially on Infiniband networks. We analyze the impact of container granularity and its potential to exploit processor and memory affinity to improve applications’ performance. Our results show that default Singularity can achieve near bare-metal performance but does not support fine-grain multi-container deployments. Docker and Singularity-instance have similar behavior in terms of the performance of deployment schemes with different container granularity and affinity. This behavior differs for the several network fabrics and protocols, and depends as well on the application communication patterns and the message size. Moreover, deployments on Infiniband are also more impacted by the computation and memory allocation, and because of that, they can exploit the affinity better.

Funder

Generalitat de Catalunya

Agencia Estatal de Investigación

Universitat Politècnica de Catalunya

Publisher

Springer Science and Business Media LLC

Subject

Computer Networks and Communications,Software

Link

https://link.springer.com/content/pdf/10.1007/s10586-021-03460-8.pdf

Reference26 articles.

1. Iosup, A., Ostermann, S., Yigitbasi, M.N., Prodan, R., Fahringer, T., Epema, D.: Performance analysis of cloud computing services for many-tasks scientific computing. IEEE Trans. Parallel Distrib. Syst. 22(6), 931–945 (2011). https://doi.org/10.1109/TPDS.2011.66

2. Beltre, A.M., Saha, P., Govindaraju, M., Younge, A., Grant, R.E.: Enabling HPC workloads on cloud infrastructure using Kubernetes container orchestration mechanisms. In: Proceedings of CANOPIE-HPC 2019: 1st International Workshop on Containers and New Orchestration Paradigms for Isolated Environments in HPC, pp. 11–20 (2019). https://doi.org/10.1109/CANOPIE-HPC49598.2019.00007

3. Liu, P., Guitart, J.: Performance comparison of multi-container deployment schemes for HPC workloads: an empirical study. Journal of Supercomputing (2020). https://doi.org/10.1007/s11227-020-03518-1

4. Zhang, J., Lu, X., Panda, D.K.: High performance MPI library for container-based HPC cloud on InfiniBand clusters. In: 45th International Conference on Parallel Processing (ICPP), pp. 268–277. IEEE (2016). https://doi.org/10.1109/ICPP.2016.38

5. Ibrahim, K.Z., Hofmeyr, S., Iancu, C.: The case for partitioning virtual machines on multicore architectures. IEEE Trans. Parallel Distrib. Syst. 25(10), 2683–2696 (2014). https://doi.org/10.1109/TPDS.2013.242

Cited by 5 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Control Groups Added Latency in NFVs: An Update Needed?;2023 IEEE Conference on Network Function Virtualization and Software Defined Networks (NFV-SDN);2023-11-07

2. Performance Characterization of Multi-Container Deployment Schemes for Online Learning Inference;2023 IEEE 16th International Conference on Cloud Computing (CLOUD);2023-07

3. Toward the Observability of Cloud-Native Applications: The Overview of the State-of-the-Art;IEEE Access;2023

4. Fine-Grained Scheduling for Containerized HPC Workloads in Kubernetes Clusters;2022 IEEE 24th Int Conf on High Performance Computing & Communications; 8th Int Conf on Data Science & Systems; 20th Int Conf on Smart City; 8th Int Conf on Dependability in Sensor, Cloud & Big Data Systems & Application (HPCC/DSS/SmartCity/DependSys);2022-12

5. Containers in HPC: a survey;The Journal of Supercomputing;2022-10-27