Affiliation:
1. Carnegie Mellon University
2. Universidade de São Paulo at São Carlos
3. Stanford University
Abstract
Given large, multimillion-node graphs (e.g., Facebook, Web-crawls, etc.), how do they evolve over time? How are they connected? What are the central nodes and the outliers? In this article we define the Radius plot of a graph and show how it can answer these questions. However, computing the Radius plot is prohibitively expensive for graphs reaching the planetary scale.
There are two major contributions in this article: (a) We propose HADI (HAdoop DIameter and radii estimator), a carefully designed and fine-tuned algorithm to compute the radii and the diameter of massive graphs, that runs on the top of the
Hadoop
/
MapReduce
system, with excellent scale-up on the number of available machines (b) We run HADI on several real world datasets including YahooWeb (6B edges, 1/8 of a Terabyte), one of the largest public graphs ever analyzed.
Thanks to HADI, we report fascinating patterns on large networks, like the surprisingly small effective diameter, the multimodal/bimodal shape of the Radius plot, and its palindrome motion over time.
Funder
Division of Information and Intelligent Systems
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior
Lawrence Livermore National Laboratory, Office of Science
Publisher
Association for Computing Machinery (ACM)
Reference46 articles.
1. Diameter of the World-Wide Web
2. The space complexity of approximating the frequency moments
3. Bader D. A. and Madduri K. 2008. A graph-theoretic analysis of the human protein-interaction network using multicore parallel algorithms. Paral. Comput. 10.1016/j.parco.2008.04.002 Bader D. A. and Madduri K. 2008. A graph-theoretic analysis of the human protein-interaction network using multicore parallel algorithms. Paral. Comput. 10.1016/j.parco.2008.04.002
4. Efficient semi-streaming algorithms for local triangle counting in massive graphs
Cited by
77 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献