Abstract
In this article, we address the problem of detecting anomalies in system log files. Computer systems generate huge numbers of events, which are noted in event log files. While most of them report normal actions, an unusual entry may inform about a failure or malware infection. A human operator may easily miss such an entry; therefore, anomaly detection methods are used for this purpose. In our work, we used an approach known from the natural language processing (NLP) domain, which operates on so-called embeddings, that is vector representations of words or phrases. We describe an improved version of the LogEvent2Vec algorithm, proposed in 2020. In contrast to the original version, we propose a significant shortening of the analysis window, which both increased the accuracy of anomaly detection and made further analysis of suspicious sequences much easier. We experimented with various binary classifiers, such as decision trees or multilayer perceptrons (MLPs), and the Blue Gene/L dataset. We showed that selecting an optimal classifier (in this case, MLP) and a short log sequence gave very good results. The improved version of the algorithm yielded the best F1-score of 0.997, compared to 0.886 in the original version of the algorithm.
Subject
Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science
Cited by
12 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献