Efficient Ensemble-based Phishing Website Classification Models using Feature Importance Attribute Selection and Hyper parameter Tuning Approaches

Author:

Jimoh R. G,OYELAKIN Akinyemi Moruff,O. C. Abikoye,M. B. Akanbi,M. D Gbolagade,A. O. Akanni,M. A. Jibrin,T. S. Ogundele

Abstract

The internet is now a common place for different business, scientific and educational activities. However, there are bad elements in the internet space that keep using different attack techniques to perpetrate evils. Among these categories are people who use phishing techniques to launch attacks in the enterprise networks and internet space. The use of machine learning (ML) approaches for phishing attacks classification is an active research area in the field of cyber security. This is because phishing attack detection is a good example of intrusion identification tasks. These machine learning techniques can be categorized as single and ensemble learners. Ensemble learners have been identified to be more promising than the single classifiers. However, some of the ways to achieve an improved ML-based detection models are through feature selection/dimensionality reduction as well as hyper parameter tuning.  This study focuses on the classification of phishing websites using ensemble learning algorithms. Random Forest (RF) and Extra Trees ensembles were used for the phishing classification. The models built from the algorithms are optimized by applying a feature importance attribute selection and hyper parameter tuning approaches. The RF-based phishing classification model achieved 99.3% accuracy, 0.996 recall, 0.983 f1-score, 0.996 precision and 1.000 as AUC score. Similarly, Extra Trees-based model attained 99.1% accuracy, 0.990 as recall, F1-score was 0.981, precision of 0.990 while AUC score is 1.000. Thus, the RF-based phishing classification model slightly achieved better classification results when compared with the Extra Trees own. The study concluded that attribute selection and hyper parameter tuning approaches employed are very promising.

Publisher

SABA Publishing

Reference21 articles.

1. Adewale, O. S., & Olugbara, O. O. (2017). A Comparative Study of Machine Learning Algorithms for Email Spam Filtering, Expert Systems with Applications, 74, 219-236.

2. Aljammal, A. H., Taamneh , S. ., Qawasmeh, A. ., & Bani Salameh, H. (2023). Machine Learning Based Phishing Attacks Detection Using Multiple Datasets. International Journal of Interactive Mobile Technologies (iJIM), 17(05), pp. 71–83. https://doi.org/10.3991/ijim.v17i05.37575

3. APWG (2022). Phishing Activity Trends Report, 4th Quarter 2022, Unifying the Global Response To Cybercrime, Activity October - December 2022, https://docs.apwg.org/reports/apwg_trends_report_q4_2022.pdf

4. Biswas, A., Dasgupta, A., & Nag, P. K. (2018). Feature Engineering and Selection for Spam URL Classification, International Journal of Computer Applications, 179(30), 25-28.

5. Breiman L. (2001). Random Forests, Machine Learning, 45(1), 5-32, (2001). Available at: https://doi.org/10.1023/A:1010933404324

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3