Combining Log Files and Monitoring Data to Detect Anomaly Patterns in a Data Center

Author:

Viola LauraORCID,Ronchieri ElisabettaORCID,Cavallaro ClaudiaORCID

Abstract

Context—Anomaly detection in a data center is a challenging task, having to consider different services on various resources. Current literature shows the application of artificial intelligence and machine learning techniques to either log files or monitoring data: the former created by services at run time, while the latter produced by specific sensors directly on the physical or virtual machine. Objectives—We propose a model that exploits information both in log files and monitoring data to identify patterns and detect anomalies over time both at the service level and at the machine level. Methods—The key idea is to construct a specific dictionary for each log file which helps to extract anomalous n-grams in the feature matrix. Several techniques of Natural Language Processing, such as wordclouds and Topic modeling, have been used to enrich such dictionary. A clustering algorithm was then applied to the feature matrix to identify and group the various types of anomalies. On the other side, time series anomaly detection technique has been applied to sensors data in order to combine problems found in the log files with problems stored in the monitoring data. Several services (i.e., log files) running on the same machine have been grouped together with the monitoring metrics. Results—We have tested our approach on a real data center equipped with log files and monitoring data that can characterize the behaviour of physical and virtual resources in production. The data have been provided by the National Institute for Nuclear Physics in Italy. We have observed a correspondence between anomalies in log files and monitoring data, e.g., a decrease in memory usage or an increase in machine load. The results are extremely promising. Conclusions—Important outcomes have emerged thanks to the integration between these two types of data. Our model requires to integrate site administrators’ expertise in order to consider all critical scenarios in the data center and understand results properly.

Publisher

MDPI AG

Subject

Computer Networks and Communications,Human-Computer Interaction

Reference26 articles.

1. Experience report: Anomaly detection of cloud application operations using log and cloud metric correlation analysis

2. Identifying anomaly detection patterns from log files: A dynamic approach;Cavallaro,2021

3. Jump-starting multivariate time series anomaly detection for online service systems;Ma;Proceedings of the 2021 USENIX Annual Technical Conference (USENIX ATC 21),2021

4. Experience report: Log mining using natural language processing and application to anomaly detection;Bertero;Proceedings of the 2017 IEEE 28th International Symposium on Software Reliability Engineering (ISSRE),2017

5. Anomaly detection of system logs based on natural language processing and deep learning;Wang;Proceedings of the 2018 4th International Conference on Frontiers of Signal Processing (ICFSP),2018

Cited by 1 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3