An Email Cyber Threat Intelligence Method Using Domain Ontology and Machine Learning-Reference-Cited by-同舟云学术

An Email Cyber Threat Intelligence Method Using Domain Ontology and Machine Learning

Published:2024-07-11 Issue:14 Volume:13 Page:2716
ISSN:2079-9292
Container-title:Electronics
language:en
Short-container-title:Electronics

Author:

Venčkauskas Algimantas¹^ORCID,Toldinas Jevgenijus¹^ORCID,Morkevičius Nerijus¹^ORCID,Sanfilippo Filippo²

Affiliation:

1. Department of Computer Science, Kaunas University of Technology, 44249 Kaunas, Lithuania

2. Department of Engineering Sciences, University of Agder (UiA), 4879 Grimstad, Norway

Abstract

Email is an excellent technique for connecting users at low cost. Spam emails pose the risk of collecting a user’s personal information by fooling them into clicking on a link or engaging in other fraudulent activities. Furthermore, when a spam message is delivered, the user may read the entire message before deciding it is spam and deleting it. Most approaches to email classification proposed by other authors use natural language processing (NLP) methods to analyze the content of email messages. One of the biggest shortcomings of NLP-based methods is their dependence on the language in which a message is written. To construct an effective email cyber threat intelligence (CTI) sharing framework, the privacy of a message’s content must be preserved. This article proposes a novel domain-specific ontology and method for emails that require only the metadata of email messages to be shared to preserve their privacy, making them applicable to solutions for sharing email CTI. To preserve privacy, a new semantic parser was developed for the proposed email domain-specific ontology to populate email metadata and create a dataset. Machine learning algorithms were examined, and experiments were conducted to identify and classify spam messages using the newly created dataset. Feature-ranking algorithms, chi-squared, ANOVA (analysis of variance), and Kruskal–Wallis tests were used. In all experiments, the kernel naïve Bayes model demonstrated acceptable results. The highest accuracy of 92.28% and an F1 score of 95.92% for recognizing spam email messages were obtained using the proposed domain-specific ontology, the newly developed semantic parser, and the created metadata dataset.

Funder

Economic Revitalization and Resilience Enhancement Plan “New Generation Lithuania”

Publisher

MDPI AG

Link

https://www.mdpi.com/2079-9292/13/14/2716/pdf

Reference39 articles.

1. Sharing Is Caring: Hurdles and Prospects of Open, Crowd-Sourced Cyber Threat Intelligence;Jesus;IEEE Trans. Eng. Manag.,2023

2. Email Classification Research Trends: Review and Open Issues;Mujtaba;IEEE Access,2017

3. A machine learning-based FinTech cyber threat attribution framework using high-level indicators of compromise;Noor;Future Gener. Comput. Syst.,2019

4. Sakellariou, G., Fouliras, P., Mavridis, I., and Sarigiannidis, P.A. (2022). Reference Model for Cyber Threat Intelligence (CTI) Systems. Electronics, 11.

5. Ramsdale, A., Shiaeles, S., and Kolokotronis, N. (2020). A Comparative Analysis of Cyber-Threat Intelligence Sources, Formats and Languages. Electronics, 9.