CySecBERT : A Domain-Adapted Language Model for the Cybersecurity Domain

Author:

Bayer Markus1ORCID,Kuehn Philipp1ORCID,Shanehsaz Ramin1ORCID,Reuter Christian1ORCID

Affiliation:

1. PEASEC, Technical University of Darmstadt, Darmstadt, Germany

Abstract

The field of cysec is evolving fast. Security professionals are in need of intelligence on past, current and —ideally — upcoming threats, because attacks are becoming more advanced and are increasingly targeting larger and more complex systems. Since the processing and analysis of such large amounts of information cannot be addressed manually, cysec experts rely on machine learning techniques. In the textual domain, pre-trained language models such as Bidirectional Encoder Representations from Transformers (BERT) have proven to be helpful as they provide a good baseline for further fine-tuning. However, due to the domain-knowledge and the many technical terms in cysec, general language models might miss the gist of textual information. For this reason, we create a high-quality dataset 1 and present a language model 2 specifically tailored to the cysec domain that can serve as a basic building block for cybersecurity systems. The model is compared on 15 tasks: Domain-dependent extrinsic tasks for measuring the performance on specific problems, intrinsic tasks for measuring the performance of the internal representations of the model, as well as general tasks from the SuperGLUE benchmark. The results of the intrinsic tasks show that our model improves the internal representation space of domain words compared with the other models. The extrinsic, domain-dependent tasks, consisting of sequence tagging and classification, show that the model performs best in cybersecurity scenarios. In addition, we pay special attention to the choice of hyperparameters against catastrophic forgetting, as pre-trained models tend to forget the original knowledge during further training.

Funder

German Federal Ministry of Education and Research and the Hessian Ministry of Higher Education, Research, Science and the Arts

National Research Center for Applied Cybersecurity ATHENE

German Federal Ministry for Education and Research

CYLENCE

Publisher

Association for Computing Machinery (ACM)

Reference53 articles.

1. Cyber warfare in the Russo-Ukrainian war: Assessment and implications;Eun Song Tae;Institute of Foreign Affairs and National Security,2022

2. Eoin Hinchy. 2022. Voice of the SOC Analyst. Technical Report. Tines. 39 pages. Retrieved from https://www.tines.com/reports/voice-of-the-soc-analyst/

3. Death to the IOC: What’s Next in Threat Intelligence;Soman Bhavna;https://www.blackhat.com/us-19/briefings/schedule/#death-to-the-ioc-whats-next-in-threat-intelligence-15392,2019

4. Cyber threat intelligence sharing: Survey and research directions

Cited by 4 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. A Review of Advancements and Applications of Pre-Trained Language Models in Cybersecurity;2024 12th International Symposium on Digital Forensics and Security (ISDFS);2024-04-29

2. Navigating the Shadows: Manual and Semi-Automated Evaluation of the Dark Web for Cyber Threat Intelligence;IEEE Access;2024

3. Enhancing Autonomous System Security and Resilience With Generative AI: A Comprehensive Survey;IEEE Access;2024

4. HackMentor: Fine-Tuning Large Language Models for Cybersecurity;2023 IEEE 22nd International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom);2023-11-01

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3