Weighted ensemble classifier for malicious link detection using natural language processing-Reference-Cited by-同舟云学术

Weighted ensemble classifier for malicious link detection using natural language processing

Published:2023-01-03 Issue: Volume: Page:
ISSN:1742-7371
Container-title:International Journal of Pervasive Computing and Communications
language:en
Short-container-title:IJPCC

Author:

A. Saleem Raja,Balasubaramanian Sundaravadivazhagan,Ganesan Pradeepa,Rajasekaran Justin,R. Karthikeyan

Abstract

Purpose The internet has completely merged into contemporary life. People are addicted to using internet services for everyday activities. Consequently, an abundance of information about people and organizations is available online, which encourages the proliferation of cybercrimes. Cybercriminals often use malicious links for large-scale cyberattacks, which are disseminated via email, SMS and social media. Recognizing malicious links online can be exceedingly challenging. The purpose of this paper is to present a strong security system that can detect malicious links in the cyberspace using natural language processing technique. Design/methodology/approach The researcher recommends a variety of approaches, including blacklisting and rules-based machine/deep learning, for automatically recognizing malicious links. But the approaches generally necessitate the generation of a set of features to generalize the detection process. Most of the features are generated by processing URLs and content of the web page, as well as some external features such as the ranking of the web page and domain name system information. This process of feature extraction and selection typically takes more time and demands a high level of expertise in the domain. Sometimes the generated features may not leverage the full potentials of the data set. In addition, the majority of the currently deployed systems make use of a single classifier for the classification of malicious links. However, prediction accuracy may vary widely depending on the data set and the classifier used. Findings To address the issue of generating feature sets, the proposed method uses natural language processing techniques (term frequency and inverse document frequency) that vectorize URLs. To build a robust system for the classification of malicious links, the proposed system implements weighted soft voting classifier, an ensemble classifier that combines predictions of base classifiers. The ability or skill of each classifier serves as the base for the weight that is assigned to it. Originality/value The proposed method performs better when the optimal weights are assigned. The performance of the proposed method was assessed by using two different data sets (D1 and D2) and compared performance against base machine learning classifiers and previous research results. The outcome accuracy shows that the proposed method is superior to the existing methods, offering 91.4% and 98.8% accuracy for data sets D1 and D2, respectively.

Publisher

Emerald

Subject

General Computer Science,Theoretical Computer Science

Reference27 articles.

1. An optimized stacking ensemble model for phishing websites detection;Electronics,2021

2. Phishing websites classification using hybrid SVM and KNN approach;International Journal of Advanced Computer Science and Applications,2017

3. The spatial analysis of the malicious uniform resource locators (URLs): 2016 dataset case study;Information,2021

4. Phishing website detection using support vector machines and nature-inspired optimization algorithms;Telecommunication Systems,2021

5. A novel ensemble machine learning method to detect phishing attack,2020

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. An Abnormal External Link Detection Algorithm Based on Multi-Modal Fusion;International Journal of Information Security and Privacy;2024-02-07

2. A Phishing-Attack-Detection Model Using Natural Language Processing and Deep Learning;Applied Sciences;2023-04-23

3. Malicious Domain Names Detection Algorithm Based on Lexical Analysis and Feature Quantification;IEEE Access;2019