Evolving High-Performance Computing Data Centers with Kubernetes, Performance Analysis, and Dynamic Workload Placement Based on Machine Learning Scheduling-Reference-Cited by-同舟云学术

Evolving High-Performance Computing Data Centers with Kubernetes, Performance Analysis, and Dynamic Workload Placement Based on Machine Learning Scheduling

Published:2024-07-05 Issue:13 Volume:13 Page:2651
ISSN:2079-9292
Container-title:Electronics
language:en
Short-container-title:Electronics

Author:

Dakić Vedran¹^ORCID,Kovač Mario²,Slovinac Jurica¹

Affiliation:

1. Department of Operating Systems, Algebra University, 10000 Zagreb, Croatia

2. Department of Control and Computer Engineering, Faculty of Electrical Engineering and Computing, University of Zagreb, 10000 Zagreb, Croatia

Abstract

In the past twenty years, the IT industry has moved away from using physical servers for workload management to workloads consolidated via virtualization and, in the next iteration, further consolidated into containers. Later, container workloads based on Docker and Podman were orchestrated via Kubernetes or OpenShift. On the other hand, high-performance computing (HPC) environments have been lagging in this process, as much work is still needed to figure out how to apply containerization platforms for HPC. Containers have many advantages, as they tend to have less overhead while providing flexibility, modularity, and maintenance benefits. This makes them well-suited for tasks requiring a lot of computing power that are latency- or bandwidth-sensitive. But they are complex to manage, and many daily operations are based on command-line procedures that take years to master. This paper proposes a different architecture based on seamless hardware integration and a user-friendly UI (User Interface). It also offers dynamic workload placement based on real-time performance analysis and prediction and Machine Learning-based scheduling. This solves a prevalent issue in Kubernetes: the suboptimal placement of workloads without needing individual workload schedulers, as they are challenging to write and require much time to debug and test properly. It also enables us to focus on one of the key HPC issues—energy efficiency. Furthermore, the application we developed that implements this architecture helps with the Kubernetes installation process, which is fully automated, no matter which hardware platform we use—x86, ARM, and soon, RISC-V. The results we achieved using this architecture and application are very promising in two areas—the speed of workload scheduling and workload placement on a correct node. This also enables us to focus on one of the key HPC issues—energy efficiency.

Publisher

MDPI AG

Link

https://www.mdpi.com/2079-9292/13/13/2651/pdf

Reference78 articles.

1. Đorđević, B., Kraljević, N., and Davidović, N. (2024, January 20–22). Performance Comparison of CPU Hardware-Assisted Features for the Type-2 Hypervisors. Proceedings of the 2024 23rd International Symposium INFOTEH-JAHORINA (INFOTEH), Jahorina, Bosnia and Herzegovina.

2. Chen, Y.-R., Liu, I.-H., Chou, C.-W., Li, J.-S., and Liu, C.-G. (2018, January 27–30). Multiple Virtual Machines Live Migration Scheduling Method Study on VMware vMotion. Proceedings of the 2018 3rd International Conference on Computer and Communication Systems (ICCCS), Nagoya, Japan.

3. Shirinbab, S., Lundberg, L., and Hakansson, J. (2016, January 4–8). Comparing Automatic Load Balancing Using VMware DRS with a Human Expert. Proceedings of the 2016 IEEE International Conference on Cloud Engineering Workshop (IC2EW), Berlin, Germany.

4. Li, Z., Kihl, M., Lu, Q., and Andersson, J.A. (2017, January 27–29). Performance Overhead Comparison between Hypervisor and Container Based Virtualization. Proceedings of the 2017 IEEE 31st International Conference on Advanced Information Networking and Applications (AINA), Taipei, Taiwan.

5. Wang, P., and Posey, S. (2013). GPU Best Practices for HPC Applications at Industry Scale. GPU Solutions to Multi-Scale Problems in Science and Engineering, Springer. Lecture Notes in Earth System Sciences.