Towards robust log parsing using self-supervised learning for system security analysis

Author:

Cao Jinhui12,Di Xiaoqiang123,Liu Xu12,Xu Rui12,Li Jinqing12,Ren Weiwu12,Qi Hui12,Hu Pengfei4,Zhang Kehan12,Li Bo12

Affiliation:

1. School of Computer Science and Technology, Changchun University of Science and Technology, Changchun, Jilin, China

2. Jilin Province Key Laboratory of Network and Information Security,Changchun University of Science and Technology, Changchun, Jilin, China

3. Information Center, Changchun University of Science and Technology, Changchun, Jilin, China

4. School of Computer Science and Technology, Shandong University, Qingdao, Shandong, China

Abstract

Logs play an important role in anomaly detection, fault diagnosis, and trace checking of software and network systems. Log parsing, which converts each raw log line to a constant template and a variable parameter list, is a prerequisite for system security analysis. Traditional parsing methods utilizing specific rules can only parse logs of specific formats, and most parsing methods based on deep learning require labels. However, the existing parsing methods are not applicable to logs of inconsistent formats and insufficient labels. To address these issues, we propose a robust Log parsing method based on Self-supervised Learning (LogSL), which can extract templates from logs of different formats. The essential idea of LogSL is modeling log parsing as a multi-token prediction task, which makes the multi-token prediction model learn the distribution of tokens belonging to the template in raw log lines by self-supervision mode. Furthermore, to accurately predict the tokens of the template without labeled data, we construct a Multi-token Prediction Model (MPM) combining the pre-trained XLNet module, the n-layer stacked Long Short-Term Memory Net module, and the Self-attention module. We validate LogSL on 12 benchmark log datasets, resulting in the average parsing accuracy of our parser being 3.9% higher than that of the best baseline method. Experimental results show that LogSL has superiority in terms of robustness and accuracy. In addition, a case study of anomaly detection is conducted to demonstrate the support of the proposed MPM to system security tasks based on logs.

Publisher

IOS Press

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3