Efficient Unsupervised Community Search with Pre-Trained Graph Transformer-Reference-Cited by-同舟云学术

Efficient Unsupervised Community Search with Pre-Trained Graph Transformer

Published:2024-05 Issue:9 Volume:17 Page:2227-2240
ISSN:2150-8097
Container-title:Proceedings of the VLDB Endowment
language:en
Short-container-title:Proc. VLDB Endow.

Author:

Wang Jianwei¹,Wang Kai²,Lin Xuemin²,Zhang Wenjie¹,Zhang Ying³

Affiliation:

1. University of New South Wales

2. ACEM, Shanghai Jiao Tong University

3. Zhejiang Gongshang University

Abstract

Community search has aroused widespread interest in the past decades. Among existing solutions, the learning-based models exhibit outstanding performance in terms of accuracy by leveraging labels to 1) train the model for community score learning, and 2) select the optimal threshold for community identification. However, labeled data are not always available in real-world scenarios. To address this notable limitation of learning-based models, we propose a pre-trained graph Trans former based community search framework that uses Zero label (i.e., unsupervised), termed TransZero. TransZero has two key phases, i.e., the offline pre-training phase and the online search phase. Specifically, in the offline pre-training phase, we design an efficient and effective community search graph transformer ( CSGphormer ) to learn node representation. To pre-train CSGphormer without the usage of labels, we introduce two self-supervised losses, i.e., personalization loss and link loss, motivated by the inherent uniqueness of node and graph topology, respectively. In the online search phase, with the representation learned by the pre-trained CSGphormer , we compute the community score without using labels by measuring the similarity of representations between the query nodes and the nodes in the graph. To free the framework from the usage of a label-based threshold, we define a new function named expected score gain to guide the community identification process. Furthermore, we propose two efficient and effective algorithms for the community identification process that run without the usage of labels. Extensive experiments over 10 public datasets illustrate the superior performance of TransZero regarding both accuracy and efficiency.

Publisher

Association for Computing Machinery (ACM)

Link

https://dl.acm.org/doi/pdf/10.14778/3665844.3665853

Reference64 articles.

1. Miklós Ajtai, János Komlós, and Endre Szemerédi. 1983. An 0 (n log n) sorting network. In Proceedings of the fifteenth annual ACM symposium on Theory of computing. 1--9.

2. Truss-based community search: a truss-equivalence based indexing approach;Akbas Esra;Proceedings of the VLDB Endowment,2017

3. Takuya Akiba, Yoichi Iwata, and Yuichi Yoshida. 2013. Linear-time enumeration of maximal K-edge-connected subgraphs in large networks by random contraction. In 22nd ACM International Conference on Information and Knowledge Management, CIKM'13, San Francisco, CA, USA, October 27 - November 1, 2013, Qi He, Arun Iyengar, Wolfgang Nejdl, Jian Pei, and Rajeev Rastogi (Eds.). ACM, 909--918. 10.1145/2505515.2505751

4. Uri Alon and Eran Yahav. 2021. On the Bottleneck of Graph Neural Networks and its Practical Implications. In International Conference on Learning Representations. https://openreview.net/forum?id=i80OPhOCVH2

5. Reid Andersen, Fan Chung, and Kevin Lang. 2006. Local graph partitioning using pagerank vectors. In 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06). IEEE, 475--486.