Survey on web spam detection-Reference-Cited by-同舟云学术

Survey on web spam detection

Published:2012-05 Issue:2 Volume:13 Page:50-64
ISSN:1931-0145
Container-title:ACM SIGKDD Explorations Newsletter
language:en
Short-container-title:SIGKDD Explor. Newsl.

Author:

Spirin Nikita¹,Han Jiawei¹

Affiliation:

1. University of Illinois at Urbana-Champaign, Urbana, IL, USA

Abstract

Search engines became a de facto place to start information acquisition on the Web. Though due to web spam phenomenon, search results are not always as good as desired. Moreover, spam evolves that makes the problem of providing high quality search even more challenging. Over the last decade research on adversarial information retrieval has gained a lot of interest both from academia and industry. In this paper we present a systematic review of web spam detection techniques with the focus on algorithms and underlying principles. We categorize all existing algorithms into three categories based on the type of information they use: content-based methods, link-based methods, and methods based on non-traditional data such as user behaviour, clicks, HTTP sessions. In turn, we perform a subcategorization of link-based category into five groups based on ideas and principles used: labels propagation, link pruning and reweighting, labels refinement, graph regularization, and featurebased. We also define the concept of web spam numerically and provide a brief survey on various spam forms. Finally, we summarize the observations and underlying principles applied for web spam detection.

Publisher

Association for Computing Machinery (ACM)

Link

https://dl.acm.org/doi/pdf/10.1145/2207243.2207252

Reference121 articles.

1. Graph regularization methods for Web spam detection

2. The connectivity sonar

3. Generalizing PageRank

Cited by 138 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Enhanced Detection of Text and Image Spam Using Cost-Sensitive Deep Learning;Traitement du Signal;2024-06-26

2. HitSim: An Efficient Algorithm for Single-Source and Top-k SimRank Computation;Information;2024-06-12

3. A Similarity-based Approach for Efficient Large Quasi-clique Detection;Proceedings of the ACM Web Conference 2024;2024-05-13

4. Dns User Profiling and Risk Assessment: A Learning Approach;2024

5. Detection of Branded Posts in User-Generated Content;Lecture Notes in Computer Science;2024