Managing the CERN Batch System with Kubernetes

Author:

Fernandez Alvarez Luis,Datskova Olga,Jones Ben,McCance Gavin

Abstract

The CERN Batch Service faces many challenges in order to get ready for the computing demands of future LHC runs. These challenges require that we look at all potential resources, assessing how efficiently we use them and that we explore different alternatives to exploit opportunistic resources in our infrastructure as well as outside of the CERN computing centre. Several projects, like BEER, Helix Nebula Science Cloud and the new OCRE project, have proven our ability to run batch workloads on a wide range of non-traditional resources. However, the challenge is not only to obtain the raw compute resources needed but how to define an operational model that is cost and time efficient, scalable and flexible enough to adapt to a heterogeneous infrastructure. In order to tackle both the provisioning and operational challenges it was decided to use Kubernetes. By using Kubernetes we benefit from a de-facto standard in containerised environments, available in nearly all cloud providers and surrounded by a vibrant ecosystem of open-source projects. Leveraging Kubernetes’ built-in functionality, and other open-source tools such as Helm, Terraform and GitLab CI, we have deployed a first cluster prototype which we discuss in detail. The effort has simplified many of the existing operational procedures we currently have, but has also made us rethink established procedures and assumptions that were only valid in a VM-based cloud environment. This contribution presents how we have adopted Kubernetes into the CERN Batch Service, the impact its adoption has in daily operations, a comparison on resource usage efficiency and the experience so far evolving our infrastructure towards this model.

Publisher

EDP Sciences

Reference29 articles.

1. C. for High Throughput Computing at UW-Madison, Computing with HTCondor, accessed February 12, 2020, https://research.cs.wisc.edu/htcondor/

2. Andrade P., Bell T., van Eldik J., McCance G., Panzer-Steindel B., dos Santos M.C., Traylen S.,, Schwickerath U., Review of CERN Data Centre Infrastructure (IOP Publishing, 2012), Vol. 396, p. 042002, https://doi.org/10.1088%2F1742-6596%2F396%2F4%2F042002

3. D. Inc., Docker, accessed February 12, 2020, https://www.docker.com/

4. OpenStack Community, OpenStack Docs: Welcome to Magnum’s Developer Documentation!, accessed June 09, 2020, https://docs.openstack.org/magnum/latest/

5. Noel B., Michelino D., Velten M., Rocha R., Trigazis S., Integrating Containers in the CERN Private Cloud (IOP Publishing, 2017), Vol. 898, p. 092045, https://doi.org/10.1088%2F1742-6596%2F898%2F9%2F092045

Cited by 2 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3