An I/O-efficient disk-based graph system for scalable second-order random walk of large graphs

Author:

Li Hongzheng1,Shao Yingxia1,Du Junping1,Cui Bin2,Chen Lei3

Affiliation:

1. BUPT

2. Peking University (Qingdao), China

3. Hong Kong University of Science and Technology

Abstract

Random walk is widely used in many graph analysis tasks, especially the first-order random walk. However, as a simplification of real-world problems, the first-order random walk is poor at modeling higher-order structures in the data. Recently, second-order random walk-based applications (e.g., Node2vec, Second-order PageRank) have become attractive. Due to the complexity of the second-order random walk models and memory limitations, it is not scalable to run second-order random walk-based applications on a single machine. Existing disk-based graph systems are only friendly to the first-order random walk models and suffer from expensive disk I/Os when executing the second-order random walks. This paper introduces an I/O-efficient disk-based graph system for the scalable second-order random walk of large graphs, called GraSorw. First, to eliminate massive light vertex I/Os, we develop a bi-block execution engine that converts random I/Os into sequential I/Os by applying a new triangular bi-block scheduling strategy, the bucket-based walk management, and the skewed walk storage. Second, to improve the I/O utilization, we design a learning-based block loading model to leverage the advantages of the full-load and on-demand load methods. Finally, we conducted extensive experiments on six large real datasets as well as several synthetic datasets.. The empirical results demonstrate that the end-to-end time cost of popular tasks in GraSorw is reduced by more than one order of magnitude compared to the existing disk-based graph systems.

Publisher

Association for Computing Machinery (ACM)

Subject

General Earth and Planetary Sciences,Water Science and Technology,Geography, Planning and Development

Reference51 articles.

1. April 17 2022. Crawlweb. http://webdatacommons.org/hyperlinkgraph/index.html April 17 2022. Crawlweb. http://webdatacommons.org/hyperlinkgraph/index.html

2. April 17 2022. Friendster. https://snap.stanford.edu/data/com-Friendster.html April 17 2022. Friendster. https://snap.stanford.edu/data/com-Friendster.html

3. April 17 2022. Graph500. https://graph500.org/ April 17 2022. Graph500. https://graph500.org/

4. April 17 2022. LiveJournal. https://snap.stanford.edu/data/soc-LiveJournal1.html April 17 2022. LiveJournal. https://snap.stanford.edu/data/soc-LiveJournal1.html

5. April 17 2022. Twitter. https://old.datahub.io/dataset/twitter-social-graph-www2010 April 17 2022. Twitter. https://old.datahub.io/dataset/twitter-social-graph-www2010

Cited by 7 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Distributed Graph Neural Network Training: A Survey;ACM Computing Surveys;2024-04-10

2. Starling: An I/O-Efficient Disk-Resident Graph Index Framework for High-Dimensional Vector Similarity Search on Data Segment;Proceedings of the ACM on Management of Data;2024-03-12

3. Enhancing Graph Random Walk Acceleration via Efficient Dataflow and Hybrid Memory Architecture;IEEE Transactions on Computers;2024-03

4. Multi-domain Recommendation with Embedding Disentangling and Domain Alignment;Proceedings of the 32nd ACM International Conference on Information and Knowledge Management;2023-10-21

5. LightTraffic: On Optimizing CPU-GPU Data Traffic for Efficient Large-scale Random Walks;2023 IEEE 39th International Conference on Data Engineering (ICDE);2023-04

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3