Link analysis for Web spam detection-Reference-Cited by-同舟云学术

Link analysis for Web spam detection

Published:2008-02 Issue:1 Volume:2 Page:1-42
ISSN:1559-1131
Container-title:ACM Transactions on the Web
language:en
Short-container-title:ACM Trans. Web

Author:

Becchetti Luca¹,Castillo Carlos²,Donato Debora²,Baeza-YATES Ricardo²,Leonardi Stefano¹

Affiliation:

1. Università di Roma La Sapienza

2. Yahoo! Research, Barcelona

Abstract

We propose link-based techniques for automatic detection of Web spam, a term referring to pages which use deceptive techniques to obtain undeservedly high scores in search engines. The use of Web spam is widespread and difficult to solve, mostly due to the large size of the Web which means that, in practice, many algorithms are infeasible. We perform a statistical analysis of a large collection of Web pages. In particular, we compute statistics of the links in the vicinity of every Web page applying rank propagation and probabilistic counting over the entire Web graph in a scalable way. These statistical features are used to build Web spam classifiers which only consider the link structure of the Web, regardless of page contents. We then present a study of the performance of each of the classifiers alone, as well as their combined performance, by testing them over a large collection of Web link spam. After tenfold cross-validation, our best classifiers have a performance comparable to that of state-of-the-art spam classifiers that use content attributes, but are orthogonal to content-based methods.

Funder

Seventh Framework Programme

Ministero dell'Istruzione, dell'Università e della Ricerca

Publisher

Association for Computing Machinery (ACM)

Subject

Computer Networks and Communications

Link

https://dl.acm.org/doi/pdf/10.1145/1326561.1326563

Reference52 articles.

1. The Space Complexity of Approximating the Frequency Moments

2. Graph-based text classification

3. Generalizing PageRank

4. Baeza-Yates R. and Ribeiro-Neto B. 1999. Modern Information Retrieval. Addison Wesley. Baeza-Yates R. and Ribeiro-Neto B. 1999. Modern Information Retrieval. Addison Wesley.

Cited by 65 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Spam ham classifier;AIP Conference Proceedings;2024

2. A flexible PageRank-based graph embedding framework closely related to spectral eigenvector embeddings;Journal of Applied and Computational Topology;2023-07-21

3. CESDAM: Centered subgraph data matrix for large graph representation;Advances in Computers;2023

4. PROTECTOR: An optimized deep learning-based framework for image spam detection and prevention;Future Generation Computer Systems;2021-12

5. Locating highly connected clusters in large networks with HyperLogLog counters;Journal of Complex Networks;2021-04-01