Enhancing Spam Message Classification and Detection Using Transformer-Based Embedding and Ensemble Learning-Reference-Cited by-同舟云学术

Enhancing Spam Message Classification and Detection Using Transformer-Based Embedding and Ensemble Learning

Published:2023-04-10 Issue:8 Volume:23 Page:3861
ISSN:1424-8220
Container-title:Sensors
language:en
Short-container-title:Sensors

Author:

Ghourabi Abdallah¹²^ORCID,Alohaly Manar³^ORCID

Affiliation:

1. Department of Computer Science, Jouf University, Sakaka 72388, Saudi Arabia

2. Higher School of Sciences and Technology of Hammam Sousse, University of Sousse, Sousse 4011, Tunisia

3. Department of Information Systems, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, P.O. Box 84428, Riyadh 11671, Saudi Arabia

Abstract

Over the last decade, the Short Message Service (SMS) has become a primary communication channel. Nevertheless, its popularity has also given rise to the so-called SMS spam. These messages, i.e., spam, are annoying and potentially malicious by exposing SMS users to credential theft and data loss. To mitigate this persistent threat, we propose a new model for SMS spam detection based on pre-trained Transformers and Ensemble Learning. The proposed model uses a text embedding technique that builds on the recent advancements of the GPT-3 Transformer. This technique provides a high-quality representation that can improve detection results. In addition, we used an Ensemble Learning method where four machine learning models were grouped into one model that performed significantly better than its separate constituent parts. The experimental evaluation of the model was performed using the SMS Spam Collection Dataset. The obtained results showed a state-of-the-art performance that exceeded all previous works with an accuracy that reached 99.91%.

Funder

Princess Nourah bint Abdulrahman University

Publisher

MDPI AG

Subject

Electrical and Electronic Engineering,Biochemistry,Instrumentation,Atomic and Molecular Physics, and Optics,Analytical Chemistry

Link

https://www.mdpi.com/1424-8220/23/8/3861/pdf

Reference32 articles.

1. SlickText (2023, February 26). 44 Mind-Blowing SMS Marketing and Texting Statistics. Available online: https://www.slicktext.com/blog/2018/11/44-mind-blowing-sms-marketing-and-texting-statistics/.

2. SmiDCA: An Anti-Smishing Model with Machine Learning Approach;Sonowal;Comput. J.,2018

3. SlickText (2023, February 26). 17 Spam Text Statistics & Spam Text Examples. Available online: https://www.slicktext.com/blog/2022/10/17-spam-text-statisitics-for-2022/.

4. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding;Burstein;Long and Short Papers, Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, 2–7 June 2019,2019

5. Brown, T.B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., and Askell, A. (2020, January 6–12). Language Models Are Few-Shot Learners. Proceedings of the 34th International Conference on Neural Information Processing Systems, Vancouver, BC, Canada.

Cited by 16 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. A survey of large language models for cyber threat detection;Computers & Security;2024-10

2. Extending limited datasets with GAN-like self-supervision for SMS spam detection;Computers & Security;2024-10

3. An Investigation of AI-Based Ensemble Methods for the Detection of Phishing Attacks;Engineering, Technology & Applied Science Research;2024-06-01

4. SMS Spam Detection using NLP and Deep Learning Recurrent Neural Network Variants;2024 International Conference on Cognitive Robotics and Intelligent Systems (ICC - ROBINS);2024-04-17

5. Scalable Learning Framework for Detecting New Types of Twitter Spam with Misuse and Anomaly Detection;Sensors;2024-04-02