Affiliation:
1. University of Maryland, College Park, MD
2. University of Maryland, College Park
Abstract
Data describing networks—such as communication networks, transaction networks, disease transmission networks, collaboration networks, etc.—are becoming increasingly available. While observational data can be useful, it often only hints at the actual underlying process that governs interactions and attributes. For example, an email communication network provides insight into its users and their relationships, but is not the same as the “real” underlying social network. In this article, we introduce the problem of
graph identification
, i.e., discovering the latent graph structure underlying an observed network. We cast the problem as a probabilistic inference task, in which we must infer the nodes, edges, and node labels of a hidden graph, based on evidence. This entails solving several canonical problems in network analysis:
entity resolution
(determining when two observations correspond to the same entity),
link prediction
(inferring the existence of links), and
node labeling
(inferring hidden attributes). While each of these subproblems has been well studied in isolation, here we consider them as a single, collective task. We present a simple, yet novel, approach to address all three subproblems simultaneously. Our approach, which we refer to as
C
3
, consists of a collection of Coupled Collective Classifiers that are applied iteratively to propagate inferred information among the subproblems. We consider variants of
C
3
using different learning and inference techniques and empirically demonstrate that
C
3
is superior, both in terms of predictive accuracy and running time, to state-of-the-art probabilistic approaches on four real problems.
Funder
National Science Foundation
Publisher
Association for Computing Machinery (ACM)
Cited by
9 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献