Exploring the effectiveness of word embedding based deep learning model for improving email classification

Author:

Asudani Deepak SureshORCID,Nagwani Naresh KumarORCID,Singh PradeepORCID

Abstract

PurposeClassifying emails as ham or spam based on their content is essential. Determining the semantic and syntactic meaning of words and putting them in a high-dimensional feature vector form for processing is the most difficult challenge in email categorization. The purpose of this paper is to examine the effectiveness of the pre-trained embedding model for the classification of emails using deep learning classifiers such as the long short-term memory (LSTM) model and convolutional neural network (CNN) model.Design/methodology/approachIn this paper, global vectors (GloVe) and Bidirectional Encoder Representations Transformers (BERT) pre-trained word embedding are used to identify relationships between words, which helps to classify emails into their relevant categories using machine learning and deep learning models. Two benchmark datasets, SpamAssassin and Enron, are used in the experimentation.FindingsIn the first set of experiments, machine learning classifiers, the support vector machine (SVM) model, perform better than other machine learning methodologies. The second set of experiments compares the deep learning model performance without embedding, GloVe and BERT embedding. The experiments show that GloVe embedding can be helpful for faster execution with better performance on large-sized datasets.Originality/valueThe experiment reveals that the CNN model with GloVe embedding gives slightly better accuracy than the model with BERT embedding and traditional machine learning algorithms to classify an email as ham or spam. It is concluded that the word embedding models improve email classifiers accuracy.

Publisher

Emerald

Subject

Library and Information Sciences,Information Systems

Reference58 articles.

1. Classification of poetry text into the emotional states using deep learning technique;IEEE Access,2020

2. Using the contextual language model BERT for multi-criteria classification of scientific articles;Journal of Biomedical Informatics,2020

3. Deep neural network and model-based clustering technique for forensic electronic mail author attribution;SN Applied Sciences,2021

4. Benchmarking performance of machine and deep learning-based methodologies for Urdu text document classification;Neural Computing and Applications,2021

5. Malicious text identification: deep learning from public comments and emails;Information (Switzerland),2020

Cited by 6 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Avionics fault classification based on improved Word2Vec word embedding;Third International Symposium on Computer Applications and Information Systems (ISCAIS 2024);2024-07-11

2. Balance Relation-Aware Attention Embedding Model for Knowledge Graph Completion;2023 IEEE Pune Section International Conference (PuneCon);2023-12-14

3. A comparative evaluation of machine learning and deep learning algorithms for question categorization of VQA datasets;Multimedia Tools and Applications;2023-12-13

4. Analysis of BERT Email Spam Classifier Against Adversarial Attacks;2023 International Conference on Artificial Intelligence and Smart Communication (AISC);2023-01-27

5. An Advanced Deep Attention Collaborative Mechanism for Secure Educational Email Services;Computational Intelligence and Neuroscience;2022-04-26

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3