A Discrete Hidden Markov Model for SMS Spam Detection

Author:

Xia TianORCID,Chen XueminORCID

Abstract

Many machine learning methods have been applied for short messaging service (SMS) spam detection, including traditional methods such as naïve Bayes (NB), vector space model (VSM), and support vector machine (SVM), and novel methods such as long short-term memory (LSTM) and the convolutional neural network (CNN). These methods are based on the well-known bag of words (BoW) model, which assumes documents are unordered collection of words. This assumption overlooks an important piece of information, i.e., word order. Moreover, the term frequency, which counts the number of occurrences of each word in SMS, is unable to distinguish the importance of words, due to the length limitation of SMS. This paper proposes a new method based on the discrete hidden Markov model (HMM) to use the word order information and to solve the low term frequency issue in SMS spam detection. The popularly adopted SMS spam dataset from the UCI machine learning repository is used for performance analysis of the proposed HMM method. The overall performance is compatible with deep learning by employing CNN and LSTM models. A Chinese SMS spam dataset with 2000 messages is used for further performance evaluation. Experiments show that the proposed HMM method is not language-sensitive and can identify spam with high accuracy on both datasets.

Funder

National Science Foundation

Publisher

MDPI AG

Subject

Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science

Reference59 articles.

1. PortioResearch Worldwide A2P SMS Markets 2014–2017: Understanding and Analysis of Application to-Person Text Messaging Markets Worldwide,2014

2. Short Messages Spam Filtering Combining Personality Recognition and Sentiment Analysis

3. Statista A2P and P2P SMS Market Revenue Worldwide from 2017 to 2022 (in Billion U.S. Dollars)https://www.statista.com/statistics/485153/a2p-sms-market-size-worldwide/

4. A Review on Mobile SMS Spam Filtering Techniques

5. Spam: Its past, present, and future

Cited by 42 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. A novel deep learning model-based optimization algorithm for text message spam detection;The Journal of Supercomputing;2024-05-02

2. Next-Gen Phishing Detection System Based on Federated Learning Integrated CNN-LSTM for SMS Communication;2024 5th International Conference on Intelligent Communication Technologies and Virtual Mobile Networks (ICICV);2024-03-11

3. Manipulating hidden-Markov-model inferences by corrupting batch data;Computers & Operations Research;2024-02

4. Content Based Classification of Short Messages using Recurrent Neural Networks in NLP;2024 International Conference on Artificial Intelligence, Computer, Data Sciences and Applications (ACDSA);2024-02-01

5. Investigating Evasive Techniques in SMS Spam Filtering: A Comparative Analysis of Machine Learning Models;IEEE Access;2024

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3