Affiliation:
1. Jinan University
2. WangAnxin Technology Co. LTD.
Abstract
Abstract
Log messages play a critical role in system analysis and issue resolution, particularly in complex software-intensive systems that demand high availability and quality assurance. However, log-based anomaly detection faces three major challenges. Firstly, millions of log data poses a significant labeling challenge. Secondly, log data tends to exhibit a severe class imbalance. Thirdly, the task of anomaly detection in such massive datasets requires both high accuracy and efficiency. Numerous deep learning based methods have been proposed to tackle those challenges. Yet, a comprehensive solution that effectively addresses all these issues has remained elusive. Through careful examination of log messages from stable systems, we find a consistency principle: the number of unique anomaly logs is consistently small. Based on this principle, we present a novel framework called ''Whole Lifecycle Tuning Anomaly Detection with Small Sample Logs'' (SSADLog). SSADLog introduces a hyper-efficient log data pre-processing method that generates a representative subset of small sample logs. It leverages a pre-trained bidirectional encoder representations from transformers (BERT) language model to create contextual word embeddings. Furthermore, a semi-supervised fine-tuning process is employed for enhancing detection accuracy. A distinctive feature of SSADLog is its ability to fine-tune language models with small samples, achieving high-performance iterations in just approximately 30 minutes. Extensive experimental evaluations show that SSADLog greatly reduces the effort to detect anomaly log messages from millions of daily new logs and outperforms the previous representative methods across various log datasets in terms of precision, recall, and F1 score.
Publisher
Research Square Platform LLC