Analysis of the Performance Impact of Fine-Tuned Machine Learning Model for Phishing URL Detection

Author:

Abdul Samad Saleem Raja1ORCID,Balasubaramanian Sundarvadivazhagan2ORCID,Al-Kaabi Amna Salim1,Sharma Bhisham3ORCID,Chowdhury Subrata4ORCID,Mehbodniya Abolfazl5ORCID,Webber Julian L.5ORCID,Bostani Ali6

Affiliation:

1. IT Department, University of Technology and Applied Sciences, Shinas 324, Oman

2. IT Department, University of Technology and Applied Sciences, Al-Musannah 314, Oman

3. Chitkara University Institute of Engineering and Technology, Chitkara University, Rajpura 140401, Punjab, India

4. Department of Computer Science and Engineering, Sreenivasa Institute of Technology and Management Studies, Chittoor 517127, Andra Pradesh, India

5. Department of Electronics and Communication Engineering, Kuwait College of Science and Technology (KCST), Doha Area, 7th Ring Road, Kuwait 7207, Kuwait

6. College of Engineering and Applied Sciences, American University of Kuwait, Salmiya 20002, Kuwait

Abstract

Phishing leverages people’s tendency to share personal information online. Phishing attacks often begin with an email and can be used for a variety of purposes. The cybercriminal will employ social engineering techniques to get the target to click on the link in the phishing email, which will take them to the infected website. These attacks become more complex as hackers personalize their fraud and provide convincing messages. Phishing with a malicious URL is an advanced kind of cybercrime. It might be challenging even for cautious users to spot phishing URLs. The researchers displayed different techniques to address this challenge. Machine learning models improve detection by using URLs, web page content and external features. This article presents the findings of an experimental study that attempted to enhance the performance of machine learning models to obtain improved accuracy for the two phishing datasets that are used the most commonly. Three distinct types of tuning factors are utilized, including data balancing, hyper-parameter optimization and feature selection. The experiment utilizes the eight most prevalent machine learning methods and two distinct datasets obtained from online sources, such as the UCI repository and the Mendeley repository. The result demonstrates that data balance improves accuracy marginally, whereas hyperparameter adjustment and feature selection improve accuracy significantly. The performance of machine learning algorithms is improved by combining all fine-tuned factors, outperforming existing research works. The result shows that tuning factors enhance the efficiency of machine learning algorithms. For Dataset-1, Random Forest (RF) and Gradient Boosting (XGB) achieve accuracy rates of 97.44% and 97.47%, respectively. Gradient Boosting (GB) and Extreme Gradient Boosting (XGB) achieve accuracy values of 98.27% and 98.21%, respectively, for Dataset-2.

Publisher

MDPI AG

Subject

Electrical and Electronic Engineering,Computer Networks and Communications,Hardware and Architecture,Signal Processing,Control and Systems Engineering

Reference36 articles.

1. Andress, J. (2014). The Basics of Information Security, Syngress. [2nd ed.]. Chapter 8.

2. (2022, December 01). Anti-Phishing Working Group (APWG) Legacy Reports. Available online: https://docs.apwg.org/reports/apwg_trends_report_q2_2022.pdf.

3. Raja, A.S., Madhubala, R., Rajesh, N., Shaheetha, L., and Arulkumar, N. (2022, January 28–30). Survey on Malicious URL Detection Techniques. Proceedings of the 6th International Conference on Trends in Electronics and Informatics (ICOEI), Tirunelveli, India.

4. Raja, A.S., Pradeepa, G., and Arulkumar, N. (2022). AIP Conference Proceedings, AIP Publishing LLC.

5. Mohammad, R., Thabtah, F., and McCluskey, T.L. (2022, December 01). Phishing Website Features. Available online: https://eprints.hud.ac.uk/id/eprint/24330/6/MohammadPhishing14July2015.pdf.

Cited by 4 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3