Evaluating Supervised Machine Learning Models for Zero-Day Phishing Attack Detection: A Comprehensive Study

Author:

Lotfi Zahra1,Valipourebrahimi Sara1,Tran Thomas1

Affiliation:

1. University of Ottawa

Abstract

Abstract To have highly secure e-commerce websites, detecting and preventing cyber-attacks is of high importance. Among diverse types of cyber-attacks, identifying zero-day attacks is problematic since they are unknown to the security system. It is because they usually are launched by an attacker and none of the existing defined patterns match with the unknown (malicious) case. There are many machine learning models developed to analyze and detect phishing websites, specifically using supervised models. However, the main issue with zero-day attacks is that they are not seen before, so their patterns are not trained to the model. Thus, the supervised models designed for detecting phishing URLs should be very accurate in predicting the label of unseen data. This research addresses the underlying issue by evaluating seven different supervised machine learning models to assess their accuracy in predicting zero-day phishing attacks. Unlike previous studies that examined models on features that are only extracted from URLs, our evaluation framework incorporates a comprehensive dataset that includes not only URL features but also third-party extracted features as well as content-based features. This research also examines the performance of the models under the impact of dimension reduction techniques. By reducing the dimensionality of the dataset, we aim to improve computational efficiency without compromising the accuracy of the models. The results depict that XGBoost performs best on zero-day attack data sets with accuracy and an f1-score of 96.6%, and PCA can be applied in high-dimensional data sets without adverse effects on the models’ performance.

Publisher

Research Square Platform LLC

Reference21 articles.

1. Combining Long-Term Recurrent Convolutional and Graph Convolutional Networks to Detect Phishing Sites Using URL and HTML;Ariyadasa S;IEEE Access,2022

2. Abdelnabi, S., Krombholz, K., & Fritz, M. (2020). VisualPhishNet: Zero-Day Phishing Website Detection by Visual Similarity. Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security, 1681–1698. https://doi.org/10.1145/3372297.3417233

3. Grid search in hyperparameter optimization of machine learning models for prediction of HIV/AIDS test results;Belete DM;International Journal of Computers and Applications,2022

4. Belfedhal, A. E., & Belfedhal, M. A. (2022, December). A Lightweight Phishing Detection System Based on Machine Learning and URL Features. In International Conference on Managing Business Through Web Analytics (pp. 307–319). Cham: Springer International Publishing.

5. Mohammed Belkebir (Eds.), International Conference on Managing Business Through Web Analytics (pp. 307–319). Springer International Publishing. https://doi.org/10.1007/978-3-031- 06971-0_22

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3