A Discrete Hidden Markov Model for SMS Spam Detection-Reference-Cited by-同舟云学术

A Discrete Hidden Markov Model for SMS Spam Detection

Published:2020-07-21 Issue:14 Volume:10 Page:5011
ISSN:2076-3417
Container-title:Applied Sciences
language:en
Short-container-title:Applied Sciences

Author:

Xia Tian^ORCID,Chen Xuemin^ORCID

Abstract

Many machine learning methods have been applied for short messaging service (SMS) spam detection, including traditional methods such as naïve Bayes (NB), vector space model (VSM), and support vector machine (SVM), and novel methods such as long short-term memory (LSTM) and the convolutional neural network (CNN). These methods are based on the well-known bag of words (BoW) model, which assumes documents are unordered collection of words. This assumption overlooks an important piece of information, i.e., word order. Moreover, the term frequency, which counts the number of occurrences of each word in SMS, is unable to distinguish the importance of words, due to the length limitation of SMS. This paper proposes a new method based on the discrete hidden Markov model (HMM) to use the word order information and to solve the low term frequency issue in SMS spam detection. The popularly adopted SMS spam dataset from the UCI machine learning repository is used for performance analysis of the proposed HMM method. The overall performance is compatible with deep learning by employing CNN and LSTM models. A Chinese SMS spam dataset with 2000 messages is used for further performance evaluation. Experiments show that the proposed HMM method is not language-sensitive and can identify spam with high accuracy on both datasets.

Funder

National Science Foundation

Publisher

MDPI AG

Subject

Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science

Link

https://www.mdpi.com/2076-3417/10/14/5011/pdf

Reference59 articles.

1. PortioResearch Worldwide A2P SMS Markets 2014–2017: Understanding and Analysis of Application to-Person Text Messaging Markets Worldwide,2014

2. Short Messages Spam Filtering Combining Personality Recognition and Sentiment Analysis

3. Statista A2P and P2P SMS Market Revenue Worldwide from 2017 to 2022 (in Billion U.S. Dollars)https://www.statista.com/statistics/485153/a2p-sms-market-size-worldwide/

4. A Review on Mobile SMS Spam Filtering Techniques

5. Spam: Its past, present, and future

Cited by 42 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. A novel deep learning model-based optimization algorithm for text message spam detection;The Journal of Supercomputing;2024-05-02

2. Next-Gen Phishing Detection System Based on Federated Learning Integrated CNN-LSTM for SMS Communication;2024 5th International Conference on Intelligent Communication Technologies and Virtual Mobile Networks (ICICV);2024-03-11

3. Manipulating hidden-Markov-model inferences by corrupting batch data;Computers & Operations Research;2024-02

4. Content Based Classification of Short Messages using Recurrent Neural Networks in NLP;2024 International Conference on Artificial Intelligence, Computer, Data Sciences and Applications (ACDSA);2024-02-01

5. Investigating Evasive Techniques in SMS Spam Filtering: A Comparative Analysis of Machine Learning Models;IEEE Access;2024