Analyzing how BERT performs entity matching

Author:

Paganelli Matteo1,Buono Francesco Del1,Baraldi Andrea1,Guerra Francesco1

Affiliation:

1. University of Modena and Reggio Emilia, Modena, Italy

Abstract

State-of-the-art Entity Matching (EM) approaches rely on transformer architectures, such as BERT , for generating highly contex-tualized embeddings of terms. The embeddings are then used to predict whether pairs of entity descriptions refer to the same real-world entity. BERT-based EM models demonstrated to be effective, but act as black-boxes for the users, who have limited insight into the motivations behind their decisions. In this paper, we perform a multi-facet analysis of the components of pre-trained and fine-tuned BERT architectures applied to an EM task. The main findings resulting from our extensive experimental evaluation are (1) the fine-tuning process applied to the EM task mainly modifies the last layers of the BERT components, but in a different way on tokens belonging to descriptions of matching / non-matching entities; (2) the special structure of the EM datasets, where records are pairs of entity descriptions is recognized by BERT; (3) the pair-wise semantic similarity of tokens is not a key knowledge exploited by BERT-based EM models.

Publisher

Association for Computing Machinery (ACM)

Subject

General Earth and Planetary Sciences,Water Science and Technology,Geography, Planning and Development

Reference34 articles.

1. Neural Networks for Entity Matching: A Survey

2. Enriching Word Vectors with Subword Information

3. Gino Brunner Yang Liu Damian Pascual Oliver Richter Massimiliano Ciaramita and Roger Wattenhofer. 2020. On Identifiability in Transformers. In ICLR. Open-Review.net. Gino Brunner Yang Liu Damian Pascual Oliver Richter Massimiliano Ciaramita and Roger Wattenhofer. 2020. On Identifiability in Transformers. In ICLR. Open-Review.net.

4. Ursin Brunner and Kurt Stockinger. 2020. Entity Matching with Transformer Architectures - A Step Forward in Data Integration. In EDBT. OpenProceedings.org 463--473. Ursin Brunner and Kurt Stockinger. 2020. Entity Matching with Transformer Architectures - A Step Forward in Data Integration. In EDBT. OpenProceedings.org 463--473.

5. Steven Cao , Victor Sanh , and Alexander M . Rush . 2021 . Low-Complexity Probing via Finding Subnetworks. CoRR abs/2104.03514 (2021). Steven Cao, Victor Sanh, and Alexander M. Rush. 2021. Low-Complexity Probing via Finding Subnetworks. CoRR abs/2104.03514 (2021).

Cited by 12 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Better entity matching with transformers through ensembles;Knowledge-Based Systems;2024-06

2. Automatic Data Repair: Are We Ready to Deploy?;Proceedings of the VLDB Endowment;2024-06

3. Fairness-Aware Data Preparation for Entity Matching;2024 IEEE 40th International Conference on Data Engineering (ICDE);2024-05-13

4. A Critical Re-evaluation of Record Linkage Benchmarks for Learning-Based Matching Algorithms;2024 IEEE 40th International Conference on Data Engineering (ICDE);2024-05-13

5. Explaining Entity Matching with Clusters of Words;2024 IEEE 40th International Conference on Data Engineering (ICDE);2024-05-13

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3