Affiliation:
1. Cornell University, Ithaca, NY
2. Huazhong University of Science and Technology, Wuhan, China
3. North Carolina State University, Raleigh, NC
Abstract
Large graphs arise in a number of contexts and understanding their structure and extracting information from them is an important research area. Early algorithms for mining communities have focused on global graph structure, and often run in time proportional to the size of the entire graph. As we explore networks with millions of vertices and find communities of size in the hundreds, it becomes important to shift our attention from macroscopic structure to microscopic structure in large networks. A growing body of work has been adopting local expansion methods in order to identify communities from a few exemplary seed members.
In this article, we propose a novel approach for finding overlapping communities called L
emon
(
L
ocal
E
xpansion via
M
inimum
O
ne
N
orm). Provided with a few known
seeds
, the algorithm finds the community by performing a local spectral diffusion. The core idea of L
emon
is to use short random walks to approximate an invariant subspace near a seed set, which we refer to as
local spectra
. Local spectra can be viewed as the low-dimensional embedding that captures the nodes’ closeness in the local network structure. We show that L
emon
’s performance in detecting communities is competitive with state-of-the-art methods. Moreover, the running time scales with the size of the community rather than that of the entire graph. The algorithm is easy to implement and is highly parallelizable. We further provide theoretical analysis of the local spectral properties, bounding the measure of tightness of extracted community using the eigenvalues of graph Laplacian.
We thoroughly evaluate our approach using both synthetic and real-world datasets across different domains, and analyze the empirical variations when applying our method to inherently different networks in practice. In addition, the heuristics on how the seed set quality and quantity would affect the performance are provided.
Funder
US Army Research Office
National Science Foundation of China
Publisher
Association for Computing Machinery (ACM)
Cited by
69 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献