Abstract
We study the problem of implementing graph algorithms efficiently on Pregel-like systems, which can be surprisingly challenging. Standard graph algorithms in this setting can incur unnecessary inefficiencies such as slow convergence or high communication or computation cost, typically due to structural properties of the input graphs such as large diameters or skew in component sizes. We describe several optimization techniques to address these inefficiencies. Our most general technique is based on the idea of performing some serial computation on a tiny fraction of the input graph, complementing Pregel's vertex-centric parallelism. We base our study on thorough implementations of several fundamental graph algorithms, some of which have, to the best of our knowledge, not been implemented on Pregel-like systems before. The algorithms and optimizations we describe are fully implemented in our open-source Pregel implementation. We present detailed experiments showing that our optimization techniques improve runtime significantly on a variety of very large graph datasets.
Subject
General Earth and Planetary Sciences,Water Science and Technology,Geography, Planning and Development
Cited by
67 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. How to Fit the SCC Algorithm Efficiently into Distributed Graph Iterative Computation;2024 IEEE 48th Annual Computers, Software, and Applications Conference (COMPSAC);2024-07-02
2. On Querying Connected Components in Large Temporal Graphs;Proceedings of the ACM on Management of Data;2023-06-13
3. Parallel Strong Connectivity Based on Faster Reachability;Proceedings of the ACM on Management of Data;2023-06-13
4. Flash: A Framework for Programming Distributed Graph Processing Algorithms;2023 IEEE 39th International Conference on Data Engineering (ICDE);2023-04
5. Handling Iterations in Distributed Dataflow Systems;ACM Computing Surveys;2022-12-31