Combining Lexical, Host, and Content-based features for Phishing Websites detection using Machine Learning Models-Reference-Cited by-同舟云学术

Combining Lexical, Host, and Content-based features for Phishing Websites detection using Machine Learning Models

Published:2024-04-17 Issue: Volume: Page:
ISSN:2032-9407
Container-title:ICST Transactions on Scalable Information Systems
language:
Short-container-title:ICST Transactions on Scalable Information Systems

Author:

Hamadouche Samiya,Boudraa Ouadjih,Gasmi Mohamed

Abstract

In cybersecurity field, identifying and dealing with threats from malicious websites (phishing, spam, and drive-by downloads, for example) is a major concern for the community. Consequently, the need for effective detection methods has become a necessity. Recent advances in Machine Learning (ML) have renewed interest in its application to a variety of cybersecurity challenges. When it comes to detecting phishing URLs, machine learning relies on specific attributes, such as lexical, host, and content based features. The main objective of our work is to propose, implement and evaluate a solution for identifying phishing URLs based on a combination of these feature sets. This paper focuses on using a new balanced dataset, extracting useful features from it, and selecting the optimal features using different feature selection techniques to build and conduct acomparative performance evaluation of four ML models (SVM, Decision Tree, Random Forest, and XGBoost). Results showed that the XGBoost model outperformed the others models, with an accuracy of 95.70% and a false negatives rate of 1.94%.

Publisher

European Alliance for Innovation n.o.

Reference47 articles.

1. Basit, A., Zafar, M., Liu, X., Javed, A.R., Jalil, Z. and Kifayat, K. (2021) A comprehensive survey of ai-enabled phishing attacks detection techniques. Telecommunication Systems 76: 139–154.

2. Alabdan, R. (2020) Phishing attacks survey: Types, vectors, and technical approaches. Future internet 12(10): 168.

3. (2021), APWG Phishing Trends Report: 4th quarter 2022. https://docs.apwg.org/reports/apwg_trends_report_q4_2022.pdf. Accessed: September 2023.

4. Ma, K.W.F. andMcKinnon, T. (2022) Covid-19 and cyber fraud: Emerging threats during the pandemic. Journal of Financial Crime 29(2): 433–446.

5. Sahoo, D., Liu, C. and Hoi, S.C. (2019) Malicious url detection using machine learning: A survey. arXiv preprint arXiv:1701.07179 .

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. A hybrid intrusion detection system with K-means and CNN+LSTM;ICST Transactions on Scalable Information Systems;2024-06-26