PromptEM-Reference-Cited by-同舟云学术

PromptEM

Published:2022-10 Issue:2 Volume:16 Page:369-378
ISSN:2150-8097
Container-title:Proceedings of the VLDB Endowment
language:en
Short-container-title:Proc. VLDB Endow.

Author:

Wang Pengfei¹,Zeng Xiaocan¹,Chen Lu¹,Ye Fan¹,Mao Yuren¹,Zhu Junhao¹,Gao Yunjun¹

Affiliation:

1. Zhejiang University

Abstract

Entity Matching (EM), which aims to identify whether two entity records from two relational tables refer to the same real-world entity, is one of the fundamental problems in data management. Traditional EM assumes that two tables are homogeneous with the aligned schema, while it is common that entity records of different formats (e.g., relational, semi-structured, or textual types) involve in practical scenarios. It is not practical to unify their schemas due to the different formats. To support EM on format-different entity records, Generalized Entity Matching (GEM) has been proposed and gained much attention recently. To do GEM, existing methods typically perform in a supervised learning way, which relies on a large amount of high-quality labeled examples. However, the labeling process is extremely labor-intensive, and frustrates the use of GEM. Low-resource GEM, i.e., GEM that only requires a small number of labeled examples, becomes an urgent need. To this end, this paper, for the first time, focuses on the low-resource GEM and proposes a novel low-resource GEM method, termed as PromptEM. PromptEM has addressed three challenging issues (i.e., designing GEM-specific prompt-tuning, improving pseudo-labels quality, and running efficient self-training) in low-resource GEM. Extensive experimental results on eight real benchmarks demonstrate the superiority of PromptEM in terms of effectiveness and efficiency.

Publisher

Association for Computing Machinery (ACM)

Subject

General Earth and Planetary Sciences,Water Science and Technology,Geography, Planning and Development

Link

https://dl.acm.org/doi/pdf/10.14778/3565816.3565836

Reference57 articles.

1. Naser Ahmadi Hansjorg Sand and Paolo Papotti. 2022. Unsupervised Matching of Data and Text. In ICDE. 1058--1070. Naser Ahmadi Hansjorg Sand and Paolo Papotti. 2022. Unsupervised Matching of Data and Text. In ICDE. 1058--1070.

2. Pasquale Balsebre Dezhong Yao Gao Cong and Zhen Hai. 2022. Geospatial Entity Resolution. In WWW. 3061--3070. Pasquale Balsebre Dezhong Yao Gao Cong and Zhen Hai. 2022. Geospatial Entity Resolution. In WWW. 3061--3070.

3. Mikhail Bilenko and Raymond J Mooney. 2003. Adaptive duplicate detection using learnable string similarity measures. In SIGKDD. 39--48. Mikhail Bilenko and Raymond J Mooney. 2003. Adaptive duplicate detection using learnable string similarity measures. In SIGKDD. 39--48.

4. Tom B Brown Benjamin Mann Nick Ryder Melanie Subbiah Jared Kaplan Prafulla Dhariwal Arvind Neelakantan Pranav Shyam Girish Sastry Amanda Askell etal 2020. Language Models are Few-Shot Learners. In NeurIPS. 1877--1901. Tom B Brown Benjamin Mann Nick Ryder Melanie Subbiah Jared Kaplan Prafulla Dhariwal Arvind Neelakantan Pranav Shyam Girish Sastry Amanda Askell et al. 2020. Language Models are Few-Shot Learners. In NeurIPS. 1877--1901.

5. William W Cohen and Jacob Richman. 2002. Learning to match and cluster large high-dimensional data sets for data integration. In SIGKDD. 475--480. William W Cohen and Jacob Richman. 2002. Learning to match and cluster large high-dimensional data sets for data integration. In SIGKDD. 475--480.

Cited by 13 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Enriching Relations with Additional Attributes for ER;Proceedings of the VLDB Endowment;2024-07

2. LRER: A Low-Resource Entity Resolution Framework with Hybrid Information;2024 International Joint Conference on Neural Networks (IJCNN);2024-06-30

3. Fairness-Aware Data Preparation for Entity Matching;2024 IEEE 40th International Conference on Data Engineering (ICDE);2024-05-13

4. MultiEM: Efficient and Effective Unsupervised Multi-Table Entity Matching;2024 IEEE 40th International Conference on Data Engineering (ICDE);2024-05-13

5. Batch Hop-Constrained s-t Simple Path Query Processing in Large Graphs;2024 IEEE 40th International Conference on Data Engineering (ICDE);2024-05-13