Beyond Word-Based Model Embeddings: Contextualized Representations for Enhanced Social Media Spam Detection

Author:

Alshattnawi Sawsan1ORCID,Shatnawi Amani1ORCID,AlSobeh Anas M.R.12ORCID,Magableh Aws A.13ORCID

Affiliation:

1. Faculty of Computer Science and Information Technology, Yarmouk University, Irbid 21163, Jordan

2. Information Technology, School of Computing, Southern Illinois University Carbondale, 1365 Douglas Drive, Carbondale, IL 62901, USA

3. Software Engineering, Computer and Information Sciences, Prince Sultan University, Riyadh 11586, Saudi Arabia

Abstract

As social media platforms continue their exponential growth, so do the threats targeting their security. Detecting disguised spam messages poses an immense challenge owing to the constant evolution of tactics. This research investigates advanced artificial intelligence techniques to significantly enhance multiplatform spam classification on Twitter and YouTube. The deep neural networks we use are state-of-the-art. They are recurrent neural network architectures with long- and short-term memory cells that are powered by both static and contextualized word embeddings. Extensive comparative experiments precede rigorous hyperparameter tuning on the datasets. Results reveal a profound impact of tailored, platform-specific AI techniques in combating sophisticated and perpetually evolving threats. The key innovation lies in tailoring deep learning (DL) architectures to leverage both intrinsic platform contexts and extrinsic contextual embeddings for strengthened generalization. The results include consistent accuracy improvements of more than 10–15% in multisource datasets, unlocking actionable guidelines on optimal components of neural models, and embedding strategies for cross-platform defense systems. Contextualized embeddings like BERT and ELMo consistently outperform their noncontextualized counterparts. The standalone ELMo model with logistic regression emerges as the top performer, attaining exceptional accuracy scores of 90% on Twitter and 94% on YouTube data. This signifies the immense potential of contextualized language representations in capturing subtle semantic signals vital for identifying disguised spam. As emerging adversarial attacks exploit human vulnerabilities, advancing defense strategies through enhanced neural language understanding is imperative. We recommend that social media companies and academic researchers build on contextualized language models to strengthen social media security. This research approach demonstrates the immense potential of personalized, platform-specific DL techniques to combat the continuously evolving threats that threaten social media security.

Publisher

MDPI AG

Reference55 articles.

1. Ham and spam e-mails classification using machine learning techniques;Bassiouni;J. Appl. Secur. Res.,2018

2. Shahzad, K., Khan, S.A., Iqbal, A., Shabbir, O., and Latif, M. (2023). Determinants of fake news diffusion on social media: A systematic literature review. Glob. Knowl. Mem. Commun., ahead-of-print.

3. Barushka, A., and Hájek, P. (2018, January 25–27). Spam filtering in social networks using regularized deep neural networks with ensemble learning. Proceedings of the Artificial Intelligence Applications and Innovations: 14th IFIP WG 12.5 International Conference, AIAI 2018, Rhodes, Greece. Proceedings 14.

4. Twitter spam detection: Survey of new approaches and comparative study;Wu;Comput. Secur.,2018

5. Predictive Analytics in Mental Health Leveraging LLM Embeddings and Machine Learning Models for Social Media Analysis;Radwan;Int. J. Web Serv. Res. (IJWSR),2024

Cited by 2 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3