Affiliation:
1. School of Computer Science and Engineering, Nanjing University of Science and Technology, No. 200, Xiaolingwei Street, Nanjing 210094, P. R. China
Abstract
In response to the continuous sophistication of cyber threat actors, it is imperative to make the best use of cyber threat intelligence converted from structured or semi-structured data and Named Entity Recognition (NER) techniques that contribute to extracting critical cyber threat intelligence. To promote the NER research in Cyber Threat Intelligence (CTI) domain, we provide a Large Dataset for NER in Cyber Threat Intelligence (LDNCTI). On the LDNCTI corpus, we investigated the feasibility of mainstream transformer-based models in CTI domain. To settle the problem of unbalanced label distribution, we introduce a transformer-based model with a Triplet Loss based on metric learning and Sorted Gradient harmonizing mechanism (TSGL). Our experimental results show that the LDNCTI well represents critical threat intelligence and that our transformer-based model with the new loss function outperforms previous schemes on the Dataset for NER in Threat Intelligence (DNRTI) and the dataset for NER in Advanced Persistent Threats (APTNER).
Publisher
World Scientific Pub Co Pte Ltd
Subject
Electrical and Electronic Engineering,Hardware and Architecture,Electrical and Electronic Engineering,Hardware and Architecture
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Fault Diagnosis with Imbalanced Datasets for Sucker Rod Pumping System Based on Dynamometer Card;2023 CAA Symposium on Fault Detection, Supervision and Safety for Technical Processes (SAFEPROCESS);2023-09-22