SpotSpam: Intention Analysis–driven SMS Spam Detection Using BERT Embeddings

Author:

Oswald C.1ORCID,Simon Sona Elza2ORCID,Bhattacharya Arnab1ORCID

Affiliation:

1. Indian Institute of Technology, Kanpur, India

2. Indian Institute of Information Technology Design and Manufacturing, Kancheepuram, India

Abstract

Short Message Service (SMS) is one of the widely used mobile applications for global communication for personal and business purposes. Its widespread use for customer interaction, business updates, and reminders has made it a billion-dollar industry in “Text Marketing.” Along with valid SMS, a tsunami of spam messages also pop up that serve various purposes for the sender and the majority of them are fraudulent. Filtering spam SMS in an accurate manner is a crucial and challenging task that will benefit human lives both mentally and economically. Some of the challenges in the filtering of spam SMS include less number of characters, texts in informal languages, lack of public SMS spam corpus, and so on. Focusing solely on the textual features of the SMS is a major handicap of the existing methods, as it lacks in dynamically adapting to the increasing number of new keywords and jargon. In this article, we develop an intention-based approach of SMS spam filtering that efficiently handles dynamic keywords by focusing on the semantics of the words. We capture both semantic and textual features of the short-text messages based on 13 pre-defined intention labels. Moreover, the contextual embeddings of the texts are generated using various pre-trained NLP (Natural Language Processing) models. Finally, intention scores are computed for the pre-defined labels and a bunch of supervised learning classifiers are employed for filtering as spam or ham. Our approaches are evaluated on the SMS Spam Collection [ 24 ] benchmark dataset, and extensive experimentation shows interesting results. Our model did remarkably well with an accuracy of 98.07%, Precision and Recall of ∼ 0.97, which is better than few of the existing state-of-the-art alternatives. Though the accuracy of our approach is not the best among other existing approaches, the model is highly stable due to its emphasis on extracting the contextual features from the text through intention labels.

Publisher

Association for Computing Machinery (ACM)

Subject

Computer Networks and Communications

Cited by 20 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. An integrated model based on deep learning classifiers and pre-trained transformer for phishing URL detection;Future Generation Computer Systems;2024-12

2. INCEPT: A Framework for Duplicate Posts Classification with Combined Text Representations;ACM Transactions on the Web;2024-08-16

3. Multilingual SMS Spam Detection using BERT and LSTM;2024 International Conference on Innovations and Challenges in Emerging Technologies (ICICET);2024-06-07

4. From Chatbots to Phishbots?: Phishing Scam Generation in Commercial Large Language Models;2024 IEEE Symposium on Security and Privacy (SP);2024-05-19

5. On SMS Phishing Tactics and Infrastructure;2024 IEEE Symposium on Security and Privacy (SP);2024-05-19

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3