Improving Automated Labeling for ATT&CK Tactics in Malware Threat Reports


Domschot Eva1,Ramyaa Ramyaa1,Smith Michael R.2


1. New Mexico Institute of Mining and Technology, USA

2. Sandia National Laboratories, USA


Once novel malware is detected, threat reports are written by security companies that discover it. The reports often vary in the terminology describing the behavior of the malware making comparisons of reports of the same malware from different companies difficult. To aid in the automated discovery of novel malware, it was recently proposed that novel malware could be detected by identifying behaviors. This assumes that a core set of behaviors are present in most, if not all, malware variants. However, there is a lack of malware datasets that are labeled with behaviors. Motivated by a need to label malware with a common set of behaviors, this work examines automating the process of labeling malware with behaviors identified in malware threat reports despite the variability of terminology. To do so, we examine several techniques from the natural language processing (NLP) domain. We find that most state-of-the-art word embedding NLP methods require large amounts of data and are trained on generic corpora of text data—missing the nuances related to information security. To address this, we use simple feature selection techniques. We find that simple feature selection techniques generally outperform word embedding methods and achieve an increase of 6% in the F .5 -score over prior work when used to predict MITRE ATT&CK tactics in threat reports. Our work indicates that feature selection, which has commonly been overlooked by sophisticated methods in NLP tasks, is beneficial for information security related tasks, where more sophisticated NLP methodologies are not able to pick out relevant information security terms.


Association for Computing Machinery (ACM)


Computer Networks and Communications,Computer Science Applications,Hardware and Architecture,Safety Research,Information Systems,Software

Reference39 articles.

1. 2019. Credential Access. 2019. Credential Access.

2. 2021. Preventing WannaCry (WCRY) ransomware attacks using Trend Micro products. 2021. Preventing WannaCry (WCRY) ransomware attacks using Trend Micro products.

3. Mutual information-based feature selection for intrusion detection systems

4. Benjamin Ampel Sagar Samtani Steven Ullman and Benjamin Ampel. 2021. Linking Common Vulnerabilities and Exposures to the MITRE ATT&CK Framework: A Self-Distillation Approach. Benjamin Ampel Sagar Samtani Steven Ullman and Benjamin Ampel. 2021. Linking Common Vulnerabilities and Exposures to the MITRE ATT&CK Framework: A Self-Distillation Approach.

5. Yevonnael Andrew , Charles Lim , and Eka Budiarto . 2022 . Mapping Linux Shell Commands to MITRE ATT&CK using NLP-Based Approach . In 2022 International Conference on Electrical Engineering and Informatics (ICELTICs). 37–42 . 10.1109/ICELTICs56128.2022.9932097 Yevonnael Andrew, Charles Lim, and Eka Budiarto. 2022. Mapping Linux Shell Commands to MITRE ATT&CK using NLP-Based Approach. In 2022 International Conference on Electrical Engineering and Informatics (ICELTICs). 37–42.







Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3