A study on machine learning techniques for the schema matching network problem-Reference-Cited by-同舟云学术

A study on machine learning techniques for the schema matching network problem

Published:2021-11-23 Issue:1 Volume:27 Page:
ISSN:0104-6500
Container-title:Journal of the Brazilian Computer Society
language:en
Short-container-title:J Braz Comput Soc

Author:

Rodrigues Diego,Silva Altigran da^ORCID

Abstract

AbstractSchema matching is the problem of finding semantic correspondences between elements from different schemas. This is a challenging problem since disparate elements in the schemas often represent the same concept. Traditional instances of this problem involved a pair of schemas. However, recently, there has been an increasing interest in matching several related schemas at once, a problem known as schema matching networks. The goal is to identify elements from several schemas that correspond to a single concept. We propose a family of methods for schema matching networks based on machine learning, which proved to be a competitive alternative for the traditional matching problem in several domains. To overcome the issue of requiring a large amount of training data, we also propose a bootstrapping procedure to generate training data automatically. In addition, we leverage constraints that arise in network scenarios to improve the quality of this data. We also study a strategy for receiving user feedback to assert some of the matchings generated and, relying on this feedback, improve the final result’s quality. Our experiments show that our methods can outperform baselines, reaching F1-score up to 0.83.

Funder

FAPESP

CAPES

CNPq

Publisher

Springer Science and Business Media LLC

Subject

General Computer Science

Link

https://link.springer.com/content/pdf/10.1186/s13173-021-00119-5.pdf

Reference44 articles.

1. Bonifati A, Velegrakis Y (2011) Schema matching and mapping: from usage to evaluation In: Proceedings of the 14th International Conference on Extending Database Technology, 527–529.. Association for Computing Machinery, New York.

2. Do H-H, Rahm E (2002) COMA: a system for flexible combination of schema matching approaches In: Proceedings of the 28th International Conference on Very Large Data Bases, 610–621.. Morgan Kaufmann Publishers, San Francisco.

3. Madhavan J, Bernstein PA, Rahm E (2001) Generic schema matching with cupid In: Proceedings of the 27th International Conference on Very Large Data Bases, 49–58.. The VLDB Endowment, New York.

4. Doan A, Domingos P, Halevy AY (2001) Reconciling schemas of disparate data sources: a machine-learning approach In: Proceedings of the 2001 ACM SIGMOD International Conference on Management of Data, 509–520.. Association for Computing Machinery, New York.

5. Bernstein PA, Madhavan J, Rahm E (2011) Generic schema matching, ten years later. PVLDB 4(11):695–701.

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Drag, Drop, Merge: A Tool for Streamlining Integration of Longitudinal Survey Instruments;Proceedings of the 2024 Workshop on Human-In-the-Loop Data Analytics;2024-06-14

2. Optimising Sustainability Accounting: Using Language Models to Match and Merge Survey Indicators;Lecture Notes in Business Information Processing;2024

3. IDAGEmb: An Incremental Data Alignment Based on Graph Embedding;Lecture Notes in Computer Science;2024