Preparing Distributed Computing Operations for the HL-LHC Era With Operational Intelligence-Reference-Cited by-同舟云学术

Preparing Distributed Computing Operations for the HL-LHC Era With Operational Intelligence

Published:2022-01-07 Issue: Volume:4 Page:
ISSN:2624-909X
Container-title:Frontiers in Big Data
language:
Short-container-title:Front. Big Data

Author:

Di Girolamo Alessandro,Legger Federica,Paparrigopoulos Panos,Schovancová Jaroslava,Beermann Thomas,Boehler Michael,Bonacorsi Daniele,Clissa Luca,Decker de Sousa Leticia,Diotalevi Tommaso,Giommi Luca,Grigorieva Maria,Giordano Domenico,Hohn David,Javůrek Tomáš,Jezequel Stephane,Kuznetsov Valentin,Lassnig Mario,Mageirakos Vasilis,Olocco Micol,Padolski Siarhei,Paltenghi Matteo,Rinaldi Lorenzo,Sharma Mayank,Tisbeni Simone Rossi,Tuckus Nikodemas

Abstract

As a joint effort from various communities involved in the Worldwide LHC Computing Grid, the Operational Intelligence project aims at increasing the level of automation in computing operations and reducing human interventions. The distributed computing systems currently deployed by the LHC experiments have proven to be mature and capable of meeting the experimental goals, by allowing timely delivery of scientific results. However, a substantial number of interventions from software developers, shifters, and operational teams is needed to efficiently manage such heterogenous infrastructures. Under the scope of the Operational Intelligence project, experts from several areas have gathered to propose and work on “smart” solutions. Machine learning, data mining, log analysis, and anomaly detection are only some of the tools we have evaluated for our use cases. In this community study contribution, we report on the development of a suite of operational intelligence services to cover various use cases: workload management, data management, and site operations.

Funder

CERN

Publisher

Frontiers Media SA

Subject

Artificial Intelligence,Information Systems,Computer Science (miscellaneous)

Reference29 articles.

1. Unified Monitoring Architecture for IT and Grid Services;Aimar;J. Phys. Conf. Ser.,2017

2. Automating ATLAS Computing Operations Using the Site Status Board;Andreeva;J. Phys. Conf. Ser.,2012

3. New Solutions for Large Scale Functional Tests in the Wlcg Infrastructure with Sam/nagios: the Experiments Experience;Andreeva;J. Phys. Conf. Ser.,2012

4. CRIC: Computing Resource Information Catalogue as a Unified Topology System for a Large Scale, Heterogeneous and Dynamic Computing Infrastructure;Anisenkov;EPJ Web Conf.,2020

5. Global Grid User Support-Building a Worldwide Distributed User Support Infrastructure;Antoni;J. Phys. Conf. Ser.,2008

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Analyzing WLCG File Transfer Errors Through Machine Learning;Computing and Software for Big Science;2022-10-22