Collective entity resolution in relational data-Reference-Cited by-同舟云学术

Collective entity resolution in relational data

Published:2007-03 Issue:1 Volume:1 Page:5
ISSN:1556-4681
Container-title:ACM Transactions on Knowledge Discovery from Data
language:en
Short-container-title:ACM Trans. Knowl. Discov. Data

Author:

Bhattacharya Indrajit¹,Getoor Lise¹

Affiliation:

1. University of Maryland, College Park, MD

Abstract

Many databases contain uncertain and imprecise references to real-world entities. The absence of identifiers for the underlying entities often results in a database which contains multiple references to the same entity. This can lead not only to data redundancy, but also inaccuracies in query processing and knowledge extraction. These problems can be alleviated through the use of entity resolution . Entity resolution involves discovering the underlying entities and mapping each database reference to these entities. Traditionally, entities are resolved using pairwise similarity over the attributes of references. However, there is often additional relational information in the data. Specifically, references to different entities may cooccur. In these cases, collective entity resolution, in which entities for cooccurring references are determined jointly rather than independently, can improve entity resolution accuracy. We propose a novel relational clustering algorithm that uses both attribute and relational information for determining the underlying domain entities, and we give an efficient implementation. We investigate the impact that different relational similarity measures have on entity resolution quality. We evaluate our collective entity resolution algorithm on multiple real-world databases. We show that it improves entity resolution performance over both attribute-based baselines and over algorithms that consider relational information but do not resolve entities collectively. In addition, we perform detailed experiments on synthetically generated data to identify data characteristics that favor collective relational resolution over purely attribute-based algorithms.

Publisher

Association for Computing Machinery (ACM)

Subject

General Computer Science

Link

https://dl.acm.org/doi/pdf/10.1145/1217299.1217304

Reference35 articles.

1. Friends and neighbors on the Web

2. Iterative record linkage for cleaning and integration

3. Bhattacharya I. and Getoor L. 2006a. Mining graph data. In Entity Resolution in Graphs. L. Holder and D. Cook Eds. John Wiley. Bhattacharya I. and Getoor L. 2006a. Mining graph data. In Entity Resolution in Graphs. L. Holder and D. Cook Eds. John Wiley.

Cited by 324 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. From plan to practice: Interorganizational crisis response networks from governmental guidelines and real‐world collaborations during hurricane events;Journal of Contingencies and Crisis Management;2024-07-27

2. Automatic generation of system model diagrams driven by multi-source heterogeneous data;Journal of Engineering Design;2024-07-06

3. High‐degree penalty based global statistical network embedding for name disambiguation in anonymized graph;Concurrency and Computation: Practice and Experience;2024-06-02

4. De-Anonymizing Users across Rating Datasets via Record Linkage and Quasi-Identifier Attacks;Data;2024-05-27

5. Collaborative contrastive learning for hypergraph node classification;Pattern Recognition;2024-02