Entity Matching by Pool-Based Active Learning-Reference-Cited by-同舟云学术

Entity Matching by Pool-Based Active Learning

Published:2024-01-30 Issue:3 Volume:13 Page:559
ISSN:2079-9292
Container-title:Electronics
language:en
Short-container-title:Electronics

Author:

Han Youfang¹,Li Chunping¹^ORCID

Affiliation:

1. School of Software, Tsinghua University, Beijing 100084, China

Abstract

The goal of entity matching is to find the corresponding records representing the same entity from different data sources. At present, in the mainstream methods, rule-based entity matching methods need tremendous domain knowledge. Machine-learning-based or deep-learning-based entity matching methods need a large number of labeled samples to build the model, which is difficult to achieve in some applications. In addition, learning-based methods are more likely to overfit, so the quality requirements of training samples are very high. In this paper, we present an active learning method for entity matching tasks. This method needs to manually label only a small number of valuable samples, and use these labeled samples to build a model with high quality. This paper proposes hybrid uncertainty as a query strategy to find those valuable samples for labeling, which can minimize the number of labeled training samples and at the same time meet the requirements of entity matching tasks. The proposed method is validated on seven data sets in different fields. The experiments show that the proposed method uses only a small number of labeled samples and achieves better effects compared to current existing approaches.

Funder

NSFC

Publisher

MDPI AG

Link

https://www.mdpi.com/2079-9292/13/3/559/pdf

Reference52 articles.

1. Technical Perspective: Toward Building Entity Matching Management Systems;Tan;SIGMOD Rec.,2018

2. Frameworks for Entity Matching: A Comparison;Koepcke;Data Knowl. Eng.,2010

3. Magellan: Toward Building Entity Matching Management Systems;Konda;VLDB Endow.,2016

4. Christen, P. (2012). Data Matching, Springer.

5. Singh, R., Meduri, V., Elmagarmid, A., Madden, S., Papotti, P., Quiané-Ruiz, J.-A., Solar-Lezama, A., and Tang, N. (2017, January 14–19). Generating Concise Entity Matching Rules. Proceedings of the ACM International Conference on Management of Data, Chicago, IL, USA.

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Synthesis of Optimal Correction Functions in the Class of Disjunctive Normal Forms;Mathematics;2024-07-05