MMTD: A Multilingual and Multimodal Spam Detection Model Combining Text and Document Images

Author:

Zhang Ziqi1,Deng Zhaohong1ORCID,Zhang Wei1,Bu Lingchao2

Affiliation:

1. The School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi 214122, China

2. The Tianjin R & D Center, Beijing Eyou Information Technology Co., Ltd., Beijing 100023, China

Abstract

Spam detection has been a topic of extensive research; however, there has been limited focus on multimodal spam detection. In this study, we introduce a novel approach for multilingual multimodal spam detection, presenting the Multilingual and Multimodal Spam Detection Model combining Text and Document Images (MMTD). Unlike previous methods, our proposed model incorporates a document image encoder to extract image features from the entire email, providing a holistic understanding of both textual and visual content through a single image. Additionally, we employ a multilingual text encoder to extract textual features, enabling our model to process multilingual text content found in emails. To fuse the multimodal features, we employ a multimodal fusion module. Addressing the challenge of scarce large multilingual multimodal spam datasets, we introduce a new multilingual multimodal spam detection dataset comprising over 30,000 samples, which stands as the largest dataset of its kind to date. This dataset facilitates a rigorous evaluation of our proposed method. Extensive experiments were conducted on this dataset, and the performance of our model was validated using a five-fold cross-validation approach. The experimental results demonstrate the superiority of our approach, with our model achieving state-of-the-art performance, boasting an accuracy of 99.8% when compared to other advanced methods in the field.

Publisher

MDPI AG

Subject

Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science

Reference32 articles.

1. A semantic-based classification approach for an enhanced spam detection;Saidani;Comput. Secur.,2020

2. Sharma, V.D., Yadav, S.K., Yadav, S.K., Singh, K.N., and Sharma, S. (Mater. Today Proc., 2021). An effective approach to protect social media account from spam mail—A machine learning approach, Mater. Today Proc., Withdrawn Article in Press.

3. Spam filtering using a logistic regression model trained by an artificial bee colony algorithm;Dedeturk;Appl. Soft Comput.,2020

4. Gao, Y., Yang, M., Zhao, X., Pardo, B., Wu, Y., Pappas, T.N., and Choudhary, A. (April, January 31). Image spam hunter. Proceedings of the 2008 IEEE International Conference on Acoustics, Speech and Signal Processing, IEEE, Las Vegas, NV, USA.

5. Convolutional neural networks for image spam detection;Sharmin;Inf. Secur. J. Glob. Perspect.,2020

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3