Entity Resolution for Multiple Sources with Extended Approach-Reference-Cited by-同舟云学术

Entity Resolution for Multiple Sources with Extended Approach

Published:2023 Issue: Volume: Page:514-526
ISSN:1865-0929
Container-title:Communications in Computer and Information Science
language:
Short-container-title:

Author:

Huu Phuc Pham^ORCID,Nie Dongyun^ORCID,Scriney Michael^ORCID

Abstract

AbstractEntity Resolution is a technique to find similar records that may refer to the same entity from one or many resources. It is mainly used in data integration or data cleaning with the existence of Big Data. It not only helps organisations have clean data, but it also provides a unified view of their data for later analysis. However, there is no one solution fitting all duplication issues. Because of the fact that the data itself is heterogeneous and varied. This paper focuses on finding the answers to the usefulness of a combination of different matching approaches, token blocking versus standard blocking and how other domain runs by examining how well they perform in different scenarios. To achieve these answers, this paper outline details and setups for these experiments to execute. A detailed evaluation demonstrates the effectiveness of the approaches with multiple datasets.

Publisher

Springer Nature Switzerland

Link

https://link.springer.com/content/pdf/10.1007/978-3-031-26438-2_40

Reference14 articles.

1. Bergroth, L., Hakonen, H., Raita, T.: A survey of longest common subsequence algorithms. In: Proceedings Seventh International Symposium on String Processing and Information Retrieval (SPIRE 2000), pp. 39–48 (2000)

2. Christen, P.: The data matching process. In: Data Matching. Data-Centric Systems and Applications, pp. 23–35. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-31164-2_2

3. Christen, P.: A survey of indexing techniques for scalable record linkage and deduplication. IEEE Trans. Knowl. Data Eng. 24(9), 1537–1555 (2012)

4. Christophides, V., Efthymiou, V., Palpanas, T., Papadakis, G., Stefanidis, K.: An overview of end-to-end entity resolution for big data. ACM Comput. Surv. 53(6), 1–42 (2020)

5. Isele, R., Bizer, C.: Learning expressive linkage rules using genetic programming. Proc. VLDB Endow. 5(11) (2012)