Distributed Graph Embedding with Information-Oriented Random Walks-Reference-Cited by-同舟云学术

Distributed Graph Embedding with Information-Oriented Random Walks

Published:2023-03 Issue:7 Volume:16 Page:1643-1656
ISSN:2150-8097
Container-title:Proceedings of the VLDB Endowment
language:en
Short-container-title:Proc. VLDB Endow.

Author:

Fang Peng¹,Khan Arijit²,Luo Siqiang³,Wang Fang¹,Feng Dan¹,Li Zhenli¹,Yin Wei¹,Cao Yuchao¹

Affiliation:

1. Huazhong University of Science and Technology, China

2. Aalborg University, Denmark

3. Nanyang Technological University, Singapore

Abstract

Graph embedding maps graph nodes to low-dimensional vectors, and is widely adopted in machine learning tasks. The increasing availability of billion-edge graphs underscores the importance of learning efficient and effective embeddings on large graphs, such as link prediction on Twitter with over one billion edges. Most existing graph embedding methods fall short of reaching high data scalability. In this paper, we present a general-purpose, distributed, information-centric random walk-based graph embedding framework, DistGER, which can scale to embed billion-edge graphs. DistGER incrementally computes information-centric random walks. It further leverages a multi-proximity-aware, streaming, parallel graph partitioning strategy, simultaneously achieving high local partition quality and excellent workload balancing across machines. DistGER also improves the distributed Skip-Gram learning model to generate node embeddings by optimizing the access locality, CPU throughput, and synchronization efficiency. Experiments on real-world graphs demonstrate that compared to state-of-the-art distributed graph embedding frameworks, including KnightKing, DistDGL, and Pytorch-BigGraph, DistGER exhibits 2.33×--129× acceleration, 45% reduction in cross-machines communication, and >10% effectiveness improvement in downstream tasks.

Publisher

Association for Computing Machinery (ACM)

Subject

General Earth and Planetary Sciences,Water Science and Technology,Geography, Planning and Development

Link

https://dl.acm.org/doi/pdf/10.14778/3587136.3587140

Reference70 articles.

1. Streaming graph partitioning

2. L. Adamic O. Buyukkokten and E. Adar. 2003. A Social Network Caught in the Web. First Monday 8 6 (2003). L. Adamic O. Buyukkokten and E. Adar. 2003. A Social Network Caught in the Web. First Monday 8 6 (2003).

3. A.-L. Barabasi and R. Albert. 1999. Emergence of Scaling in Random Networks. Science 286 5439 (1999) 509--512. A.-L. Barabasi and R. Albert. 1999. Emergence of Scaling in Random Networks. Science 286 5439 (1999) 509--512.

4. M. Belkin and P. Niyogi. 2001. Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering. In NeurIPS. M. Belkin and P. Niyogi. 2001. Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering. In NeurIPS.

5. S. Bhagat G. Cormode and S. Muthukrishnan. 2011. Node Classification in Social Networks. In Social Network Data Analytics Charu C. Aggarwal (Ed.). Springer 115--148. S. Bhagat G. Cormode and S. Muthukrishnan. 2011. Node Classification in Social Networks. In Social Network Data Analytics Charu C. Aggarwal (Ed.). Springer 115--148.

Cited by 8 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. TIGER: Training Inductive Graph Neural Network for Large-Scale Knowledge Graph Reasoning;Proceedings of the VLDB Endowment;2024-06

2. DeepWalk with Reinforcement Learning (DWRL) for node embedding;Expert Systems with Applications;2024-06

3. Efficient Approximation of Kemeny's Constant for Large Graphs;Proceedings of the ACM on Management of Data;2024-05-29

4. Scalable Node Embedding Algorithms Using Distributed Sparse Matrix Operations;2024 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW);2024-05-27

5. Synergies Between Graph Data Management and Machine Learning in Graph Data Pipeline;2024 IEEE 40th International Conference on Data Engineering (ICDE);2024-05-13