A Self-Supervised Approach to Comment Spam Detection Based on Content Analysis-Reference-Cited by-同舟云学术

A Self-Supervised Approach to Comment Spam Detection Based on Content Analysis

Published:2011-01 Issue:1 Volume:5 Page:14-32
ISSN:1930-1650
Container-title:International Journal of Information Security and Privacy
language:en
Short-container-title:

Author:

Bhattarai A.¹,Dasgupta D.¹

Affiliation:

1. University of Memphis, USA

Abstract

This paper studies the problems and threats posed by a type of spam in the blogosphere, called blog comment spam. It explores the challenges introduced by comment spam, generalizing the analysis substantially to any other short text type spam. The authors analyze different high-level features of spam and legitimate comments based on the content of blog postings. The authors use these features to cluster data separately for each feature using K-Means clustering algorithm. The authors also use self-supervised learning, which could classify spam and legitimate comments automatically. Compared with existing solutions, this approach demonstrates more flexibility and adaptability to the environment, as it requires minimal human intervention. The preliminary evaluation of the proposed spam detection system shows promising results.

Publisher

IGI Global

Subject

Information Systems

Reference44 articles.

1. Akismet. (n.d.). Home. Retrieved from http://akismet.com/

2. Assis, F. (2006). A text classification module for Lua – the importance of the training method. In Proceedings of the 15th Text Retrieval Conference, Gaithersburg, MD.

3. Becchetti, L., Castillo, C., Donato, D., Leonardi, S., & Baeza-Yates, R. (2005). Link-based Characterization and Detection of Web Spam. In Proceedings of the 2nd International Workshop on Adversarial Information Retrieval on the Web (AIRWeb), Seattle, WA.

4. Bhattarai, A., Rus, V., & Dasgupta, D. (2009, March). Characterizing Comment Spam in the Blogosphere through Content Analysis. In Proceedings of the Symposium on Computational Intelligence in Cyber Security (CICS), IEEE Symposium Series on Computational Intelligence (SSCI 2009).

5. Blum, A., & Mitchell, T. (1998). Combining Labeled and Unlabeled Data with Co-Training. In Proceedings of the 11th Annual Conference on Computational Learning Theory.

Cited by 12 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Evaluation of AI Techniques for Detecting Deceptive Reviews in Cyberspace: A Study of Pre- and Post-COVID-19 Trends;2023 Second International Conference on Electronics and Renewable Systems (ICEARS);2023-03-02

2. Self-Supervised Learning Implementation for Malware Detection;2022 8th International Conference on Wireless and Telematics (ICWT);2022-07-21

3. A Systematic Review on Spam Filtering Techniques based on Natural Language Processing Framework;2021 11th International Conference on Cloud Computing, Data Science & Engineering (Confluence);2021-01-28

4. Detection of Inappropriate Anonymous Comments Using NLP and Sentiment Analysis;Learning and Analytics in Intelligent Systems;2019-07-13

5. Online Fake Comments Detecting Model Based on Feature Analysis;2018 International Conference on Smart Grid and Electrical Automation (ICSGEA);2018-06