Affiliation:
1. Megagon Labs, Mountain View, CA
Abstract
Entity matching refers to the task of determining whether two different representations refer to the same real-world entity. It continues to be a prevalent problem for many organizations where data resides in different sources and duplicates the need to be identified and managed. The term “entity matching” also loosely refers to the broader problem of determining whether two heterogeneous representations of
different entities
should be associated together. This problem has an even wider scope of applications, from determining the subsidiaries of companies to matching jobs to job seekers, which has impactful consequences.
In this article, we first report our recent system D
ITTO
, which is an example of a modern entity matching system based on pretrained language models. Then we summarize recent solutions in applying deep learning and pre-trained language models for solving the entity matching task. Finally, we discuss research directions beyond entity matching, including the promise of synergistically integrating blocking and entity matching steps together, the need to examine methods to alleviate steep training data requirements that are typical of deep learning or pre-trained language models, and the importance of generalizing entity matching solutions to handle the broader entity matching problem, which leads to an even more pressing need to explain matching outcomes.
Publisher
Association for Computing Machinery (ACM)
Subject
Information Systems and Management,Information Systems
Cited by
34 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Creating A Patient Data Redundancy Detection Model using Deep Learning Methods;2024 7th International Conference on Informatics and Computational Sciences (ICICoS);2024-07-17
2. Threshold-Independent Fair Matching through Score Calibration;Proceedings of the Conference on Governance, Understanding and Integration of Data for Effective and Responsible AI;2024-06-09
3. Applications and Challenges for Large Language Models: From Data Management Perspective;2024 IEEE 40th International Conference on Data Engineering (ICDE);2024-05-13
4. Gen-T: Table Reclamation in Data Lakes;2024 IEEE 40th International Conference on Data Engineering (ICDE);2024-05-13
5. Fairness-Aware Data Preparation for Entity Matching;2024 IEEE 40th International Conference on Data Engineering (ICDE);2024-05-13