Affiliation:
1. Beijing Jiaotong University
2. University of Central Florida
Abstract
Abstract
With versatility and complexity of computer systems, warning and errors are inevitable. To effectively monitor system’s status, system logs are critical. To detect anomalies in system logs, deep learning is a promising way to go. However, abnormal system logs in the real world are often difficult to collect, and effectively and accurately categorize the logs is an even time-consuming project. Thus, the data incompleteness is not conducive to the deep learning for this practical application. In this paper, we put forward a novel semi-supervised dual branch model that alleviate the need for large scale labeled logs for training a deep system log anomaly detector. Specifically, our model consists of two homogeneous networks that share the same parameters, one is called weak augmented teacher model and the other is termed as strong augmented student model. In the teacher model, the log features are augmented with small Gaussian noise, while in the student model, the strong augmentation is injected to force the model to learn a more robust feature representation with the guidance of teacher model provided soft labels. Furthermore, to further utilize unlabeled samples effectively, we propose a flexible label screening strategy that takes into account the confidence and stability of pseudo-labels. Experimental results show favorable effect of our model on prevalent HDFS and Hadoop Application datasets. Precisely, with only 30% training data labeled, our model can achieve the comparable results as the fully supervised version.
Publisher
Research Square Platform LLC