Affiliation:
1. Indiana University and Institute for Scientific Interchange, Turin, Italy
2. Universidad Nacional del Sur and CONICET, Blanca, Argentina
3. Universitat de Barcelona, Barcelona, Spain
Abstract
The understanding of the immense and intricate topological structure of the World Wide Web (WWW) is a major scientific and technological challenge. This has been recently tackled by characterizing the properties of its representative graphs, in which vertices and directed edges are identified with Web pages and hyperlinks, respectively. Data gathered in large-scale crawls have been analyzed by several groups resulting in a general picture of the WWW that encompasses many of the complex properties typical of rapidly evolving networks. In this article, we report a detailed statistical analysis of the topological properties of four different WWW graphs obtained with different crawlers. We find that, despite the very large size of the samples, the statistical measures characterizing these graphs differ quantitatively, and in some cases qualitatively, depending on the domain analyzed and the crawl used for gathering the data. This spurs the issue of the presence of sampling biases and structural differences of Web crawls that might induce properties not representative of the actual global underlying graph. In short, the stability of the widely accepted statistical description of the Web is called into question. In order to provide a more accurate characterization of the Web graph, we study statistical measures beyond the degree distribution, such as degree-degree correlation functions or the statistics of reciprocal connections. The latter appears to enclose the relevant correlations of the WWW graph and carry most of the topological information of the Web. The analysis of this quantity is also of major interest in relation to the navigability and searchability of the Web.
Publisher
Association for Computing Machinery (ACM)
Subject
Computer Networks and Communications
Cited by
48 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献