Exploring the effectiveness of word embedding based deep learning model for improving email classification-Reference-Cited by-同舟云学术

Exploring the effectiveness of word embedding based deep learning model for improving email classification

Published:2022-02-02 Issue:4 Volume:56 Page:483-505
ISSN:2514-9288
Container-title:Data Technologies and Applications
language:en
Short-container-title:DTA

Author:

Asudani Deepak Suresh^ORCID,Nagwani Naresh Kumar^ORCID,Singh Pradeep^ORCID

Abstract

PurposeClassifying emails as ham or spam based on their content is essential. Determining the semantic and syntactic meaning of words and putting them in a high-dimensional feature vector form for processing is the most difficult challenge in email categorization. The purpose of this paper is to examine the effectiveness of the pre-trained embedding model for the classification of emails using deep learning classifiers such as the long short-term memory (LSTM) model and convolutional neural network (CNN) model.Design/methodology/approachIn this paper, global vectors (GloVe) and Bidirectional Encoder Representations Transformers (BERT) pre-trained word embedding are used to identify relationships between words, which helps to classify emails into their relevant categories using machine learning and deep learning models. Two benchmark datasets, SpamAssassin and Enron, are used in the experimentation.FindingsIn the first set of experiments, machine learning classifiers, the support vector machine (SVM) model, perform better than other machine learning methodologies. The second set of experiments compares the deep learning model performance without embedding, GloVe and BERT embedding. The experiments show that GloVe embedding can be helpful for faster execution with better performance on large-sized datasets.Originality/valueThe experiment reveals that the CNN model with GloVe embedding gives slightly better accuracy than the model with BERT embedding and traditional machine learning algorithms to classify an email as ham or spam. It is concluded that the word embedding models improve email classifiers accuracy.

Publisher

Emerald

Subject

Library and Information Sciences,Information Systems

Reference58 articles.

1. Classification of poetry text into the emotional states using deep learning technique;IEEE Access,2020

2. Using the contextual language model BERT for multi-criteria classification of scientific articles;Journal of Biomedical Informatics,2020

3. Deep neural network and model-based clustering technique for forensic electronic mail author attribution;SN Applied Sciences,2021

4. Benchmarking performance of machine and deep learning-based methodologies for Urdu text document classification;Neural Computing and Applications,2021

5. Malicious text identification: deep learning from public comments and emails;Information (Switzerland),2020

Cited by 6 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Avionics fault classification based on improved Word2Vec word embedding;Third International Symposium on Computer Applications and Information Systems (ISCAIS 2024);2024-07-11

2. Balance Relation-Aware Attention Embedding Model for Knowledge Graph Completion;2023 IEEE Pune Section International Conference (PuneCon);2023-12-14

3. A comparative evaluation of machine learning and deep learning algorithms for question categorization of VQA datasets;Multimedia Tools and Applications;2023-12-13

4. Analysis of BERT Email Spam Classifier Against Adversarial Attacks;2023 International Conference on Artificial Intelligence and Smart Communication (AISC);2023-01-27

5. An Advanced Deep Attention Collaborative Mechanism for Secure Educational Email Services;Computational Intelligence and Neuroscience;2022-04-26