Anomaly Detection Using Autoencoders in High Performance Computing Systems-Reference-Cited by-同舟云学术

Anomaly Detection Using Autoencoders in High Performance Computing Systems

Published:2019-07-17 Issue: Volume:33 Page:9428-9433
ISSN:2374-3468
Container-title:Proceedings of the AAAI Conference on Artificial Intelligence
language:
Short-container-title:AAAI

Author:

Borghesi Andrea,Bartolini Andrea,Lombardi Michele,Milano Michela,Benini Luca

Abstract

Anomaly detection in supercomputers is a very difficult problem due to the big scale of the systems and the high number of components. The current state of the art for automated anomaly detection employs Machine Learning methods or statistical regression models in a supervised fashion, meaning that the detection tool is trained to distinguish among a fixed set of behaviour classes (healthy and unhealthy states).We propose a novel approach for anomaly detection in HighPerformance Computing systems based on a Machine (Deep) Learning technique, namely a type of neural network called autoencoder. The key idea is to train a set of autoencoders to learn the normal (healthy) behaviour of the supercomputer nodes and, after training, use them to identify abnormal conditions. This is different from previous approaches which where based on learning the abnormal condition, for which there are much smaller datasets (since it is very hard to identify them to begin with).We test our approach on a real supercomputer equipped with a fine-grained, scalable monitoring infrastructure that can provide large amount of data to characterize the system behaviour. The results are extremely promising: after the training phase to learn the normal system behaviour, our method is capable of detecting anomalies that have never been seen before with a very good accuracy (values ranging between 88% and 96%).

Publisher

Association for the Advancement of Artificial Intelligence (AAAI)

Subject

General Medicine

Cited by 92 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. ASOD: an adaptive stream outlier detection method using online strategy;Journal of Cloud Computing;2024-07-05

2. Reimagine Application Performance as a Graph: Novel Graph-Based Method for Performance Anomaly Classification in High-Performance Computing;2024 IEEE 48th Annual Computers, Software, and Applications Conference (COMPSAC);2024-07-02

3. HD-KT: Advancing Robust Knowledge Tracing via Anomalous Learning Interaction Detection;Proceedings of the ACM Web Conference 2024;2024-05-13

4. Runtime Performance Anomaly Diagnosis in Production HPC Systems Using Active Learning;IEEE Transactions on Parallel and Distributed Systems;2024-04

5. Discriminative boundary generation for effective outlier detection;Knowledge and Information Systems;2024-01-22