Affiliation:
1. Bohai University
2. Dalian University of Technology
Abstract
Graph clustering is an important technology in graph analysis area, the measure of similarity between node of graph is the presise for graph clustering. SimRank algorithm is a kind of universal structure similarity calculation model which is proposed by Jeh and Widom. SimRank algorithm using iterative method to calculate the similarity between nodes, so the time and space complexity is very high. With the rapid increase of data, the ability of single machine can not meet the requirement of the large-scale data calculation. In this paper, the distributed SimRank algorithm was proposed based on Mapreduce and was used to measure the similarity of graph. Then the distributed AP clustering algorithm was designed for clustering analysis graph nodes. The experimental was executed to compare the clustering running time and speedup and results show that the method can efficiently complete graph nodes similarity measure and clustering the large graph effectively.
Publisher
Trans Tech Publications, Ltd.
Reference9 articles.
1. H. C. Wang, J. Ma, Study of Efficient Clustering Algorithm on Large Graphs, Journal of Chinese Computer Systems, vol. 34, no. 6, pp.1417-1423, (2013).
2. F. Du, Y. G. Chen, X. Y. Du, Survey of RDF Query Processing Techniques, Journal of Software, vol. 24, no. 6, pp.1222-1241, (2013).
3. G. WU, Research on Key Technologies of RDF Graph Data Management, Tsinghua University press, (2008).
4. P. Zhao, J. Han and Y. Sun, P-rank: A comprehensive structural similarity measure over information networks, International Conference on Information and Knowledge Management, (2009).
5. G. Jeh and J. Widom, SimRank: a measure of structural-context similarity, " In Proceedings of the eighth ACM SIGKDD conference(KDD, 02), (2002).
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献