Author:
Mosciatti Simone,Lange Clemens,Blomer Jakob
Abstract
The past years have shown a revolution in the way scientific workloads are being executed thanks to the wide adoption of software containers. These containers run largely isolated from the host system, ensuring that the development and execution environments are the same everywhere. This enables full reproducibility of the workloads and therefore also the associated scientific analyses performed. However, as the research software used becomes increasingly complex, the software images grow easily to sizes of multiple gigabytes. Downloading the full image onto every single compute node on which the containers are executed becomes unpractical. In this paper, we describe a novel way of distributing software images on the Kubernetes platform, with which the container can start before the entire image contents become available locally (so-called “lazy pulling”). Each file required for the execution is fetched individually and subsequently cached on-demand using the CernVM file system (CVMFS), enabling the execution of very large software images on potentially thousands of Kubernetes nodes with very little overhead. We present several performance benchmarks making use of typical high-energy physics analysis workloads.
Subject
Artificial Intelligence,Information Systems,Computer Science (miscellaneous)
Reference22 articles.
1. Containerd - An Industry-Standard Container Runtime With an Emphasis on Simplicity, Robustness and Portability
2. squid : Optimising Web Delivery
3. LHC Machine;Evans;JINST,2008
4. 2021
5. Distributing LHC application software and conditions databases using the CernVM file system;Blomer;J. Phys. Conf. Ser.,2011
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献