Abstract
AbstractContainerization technology offers an appealing alternative for encapsulating and operating applications (and all their dependencies) without being constrained by the performance penalties of using Virtual Machines and, as a result, has got the interest of the High-Performance Computing (HPC) community to obtain fast, customized, portable, flexible, and reproducible deployments of their workloads. Previous work on this area has demonstrated that containerized HPC applications can exploit InfiniBand networks, but has ignored the potential of multi-container deployments which partition the processes that belong to each application into multiple containers in each host. Partitioning HPC applications has demonstrated to be useful when using virtual machines by constraining them to a single NUMA (Non-Uniform Memory Access) domain. This paper conducts a systematical study on the performance of multi-container deployments with different network fabrics and protocols, focusing especially on Infiniband networks. We analyze the impact of container granularity and its potential to exploit processor and memory affinity to improve applications’ performance. Our results show that default Singularity can achieve near bare-metal performance but does not support fine-grain multi-container deployments. Docker and Singularity-instance have similar behavior in terms of the performance of deployment schemes with different container granularity and affinity. This behavior differs for the several network fabrics and protocols, and depends as well on the application communication patterns and the message size. Moreover, deployments on Infiniband are also more impacted by the computation and memory allocation, and because of that, they can exploit the affinity better.
Funder
Generalitat de Catalunya
Agencia Estatal de Investigación
Universitat Politècnica de Catalunya
Publisher
Springer Science and Business Media LLC
Subject
Computer Networks and Communications,Software
Cited by
5 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Control Groups Added Latency in NFVs: An Update Needed?;2023 IEEE Conference on Network Function Virtualization and Software Defined Networks (NFV-SDN);2023-11-07
2. Performance Characterization of Multi-Container Deployment Schemes for Online Learning Inference;2023 IEEE 16th International Conference on Cloud Computing (CLOUD);2023-07
3. Toward the Observability of Cloud-Native Applications: The Overview of the State-of-the-Art;IEEE Access;2023
4. Fine-Grained Scheduling for Containerized HPC Workloads in Kubernetes Clusters;2022 IEEE 24th Int Conf on High Performance Computing & Communications; 8th Int Conf on Data Science & Systems; 20th Int Conf on Smart City; 8th Int Conf on Dependability in Sensor, Cloud & Big Data Systems & Application (HPCC/DSS/SmartCity/DependSys);2022-12
5. Containers in HPC: a survey;The Journal of Supercomputing;2022-10-27