Hybrid Features by Combining Visual and Text Information to Improve Spam Filtering Performance-Reference-Cited by-同舟云学术

Hybrid Features by Combining Visual and Text Information to Improve Spam Filtering Performance

Published:2022-06-30 Issue:13 Volume:11 Page:2053
ISSN:2079-9292
Container-title:Electronics
language:en
Short-container-title:Electronics

Author:

Nam Seong-Guk,Jang Yonghun,Lee Dong-Gun^ORCID,Seo Yeong-Seok^ORCID

Abstract

The development of information and communication technology has created many positive outcomes, including convenience for people; however, cases of unsolicited communication, such as spam, also occur frequently. Spam is the indiscriminate transmission of unwanted information by anonymous users, called spammers. Spam content is indiscriminately transmitted to users in various forms, such as SMS, e-mail, and social network service posts, causing negative experiences for users of the service, while also creating costs, such as unnecessarily large amounts of network traffic. In addition, spam content includes phishing, hype or false advertising, and illegal content. Recently, spammers have also used images that contain stimulating content to effectively attract users’ curiosity and attention. Image spam contains more complex information than text, making it more difficult to analyze and to generalize its properties compared to text. Therefore, existing text-based spam detectors are vulnerable to spam image attacks, resulting in a decline in service quality. In this paper, a “hybrid features by combining visual and text information to improve spam filtering performance” method is proposed to reduce the occurrence of misclassification. The proposed method employs three sub-models to extract features from spam images and a classifier model to output the results using the features. Each sub-model extracts topic-, word-, and image-embedding-based features from spam images. In addition, the sub-models use optical character recognition, latent Dirichlet allocation, and word2Vec techniques to extract features from images. To evaluate spam image classification performance, the spam classifiers were trained using the extracted features and the results were measured using a confusion matrix. Our model achieved an accuracy of 0.9814 and a macro-F1 score of 0.9813. In addition, the application of OCR evasion techniques resulted in a decrease in recognition performance. Using the proposed model, a mean macro-F1 score of 0.9607 was obtained.

Funder

National Research Foundation of Korea

Publisher

MDPI AG

Subject

Electrical and Electronic Engineering,Computer Networks and Communications,Hardware and Architecture,Signal Processing,Control and Systems Engineering

Link

https://www.mdpi.com/2079-9292/11/13/2053/pdf

Reference51 articles.

1. Effective Management of Energy Consumption during the COVID-19 Pandemic: The Role of ICT Solutions

2. Who benefits from online financing? A sharing economy E-tailing platform perspective

3. An innovative and authentic way of learning how to consult remotely in response to the COVID-19 pandemic

4. Blockchain-based electronic healthcare record system for healthcare 4.0 applications

5. Intelligent classroom a conceptual model for the effective use of internet of things technique;Alhaboobi;Proceedings of the 2019 2nd Scientific Conference of Computer Sciences (SCCS),2019

Cited by 7 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Next-Generation Spam Filtering: Comparative Fine-Tuning of LLMs, NLPs, and CNN Models for Email Spam Classification;Electronics;2024-05-23

2. Analysis of Machine Learning Models for Spam Email Detection and Real-Time Integration;2024 International Conference on Science, Engineering and Business for Driving Sustainable Development Goals (SEB4SDG);2024-04-02

3. Hybrid Machine Learning Algorithms for Email and Malware Spam Filtering: A Review;European Journal of Theoretical and Applied Sciences;2024-03-01

4. Email Security Issues, Tools, and Techniques Used in Investigation;Sustainability;2023-07-05

5. Deep learning-based spam image filtering;Alexandria Engineering Journal;2023-04