Semantic Hierarchical Classification Applied to Anomaly Detection Using System Logs with a BERT Model-Reference-Cited by-同舟云学术

Semantic Hierarchical Classification Applied to Anomaly Detection Using System Logs with a BERT Model

Published:2024-06-21 Issue:13 Volume:14 Page:5388
ISSN:2076-3417
Container-title:Applied Sciences
language:en
Short-container-title:Applied Sciences

Author:

Corbelle Clara¹,Carneiro Victor¹^ORCID,Cacheda Fidel¹^ORCID

Affiliation:

1. Department of Computer Science and Information Technologies, University of A Coruña, 15071 A Coruña, Spain

Abstract

The compaction and structuring of system logs facilitate and expedite anomaly and cyberattack detection processes using machine-learning techniques, while simultaneously reducing alert fatigue caused by false positives. In this work, we implemented an innovative algorithm that employs hierarchical codes based on the semantics of natural language, enabling the generation of a significantly reduced log that preserves the semantics of the original. This method uses codes that reflect the specificity of the topic and its position within a higher hierarchical structure. By applying this catalog to the analysis of logs from the Hadoop Distributed File System (HDFS), we achieved a concise summary with non-repetitive themes, significantly speeding up log analysis and resulting in a substantial reduction in log size while maintaining high semantic similarity. The resulting log has been validated for anomaly detection using the “bert-base-uncased” model and compared with six other methods: PCA, IM, LogCluster, SVM, DeepLog, and LogRobust. The reduced log achieved very similar values in precision, recall, and F1-score metrics, but drastically reduced processing time.

Publisher

MDPI AG

Link

https://www.mdpi.com/2076-3417/14/13/5388/pdf

Reference34 articles.

1. Ustun, T.S., Hussain, S.M.S., Ulutas, A., Onen, A., Roomi, M.M., and Mashima, D. (2021). Machine Learning-Based Intrusion Detection for Achieving Cybersecurity in Smart Grids Using IEC 61850 GOOSE Messages. Symmetry, 13.

2. Ban, T., Takahashi, T., Ndichu, S., and Inoue, D. (2023). Breaking Alert Fatigue: AI-Assisted SIEM Framework for Effective Incident Response. Appl. Sci., 13.

3. Fält, M., Forsström, S., and Zhang, T. (2021, January 24–26). Machine Learning Based Anomaly Detection of Log Files Using Ensemble Learning and Self-Attention. Proceedings of the 2021 5th International Conference on System Reliability and Safety (ICSRS), Palermo, Italy.

4. Balasubramanian, P., Seby, J., and Kostakos, P. (2023, January 15–18). Transformer-based LLMs in Cybersecurity: An in-depth Study on Log Anomaly Detection and Conversational Defense Mechanisms. Proceedings of the 2023 IEEE International Conference on Big Data (BigData), Sorrento, Italy.

5. Liu, Y., Zhang, X., He, S., Zhang, H., Li, L., Kang, Y., Xu, Y., Ma, M., Lin, Q., and Dang, Y. (2022, January 25–29). UniParser: A Unified Log Parser for Heterogeneous Log Data. Proceedings of the ACM Web Conference, Lyon, France.