A study on machine learning techniques for the schema matching network problem

Author:

Rodrigues Diego,Silva Altigran daORCID

Abstract

AbstractSchema matching is the problem of finding semantic correspondences between elements from different schemas. This is a challenging problem since disparate elements in the schemas often represent the same concept. Traditional instances of this problem involved a pair of schemas. However, recently, there has been an increasing interest in matching several related schemas at once, a problem known as schema matching networks. The goal is to identify elements from several schemas that correspond to a single concept. We propose a family of methods for schema matching networks based on machine learning, which proved to be a competitive alternative for the traditional matching problem in several domains. To overcome the issue of requiring a large amount of training data, we also propose a bootstrapping procedure to generate training data automatically. In addition, we leverage constraints that arise in network scenarios to improve the quality of this data. We also study a strategy for receiving user feedback to assert some of the matchings generated and, relying on this feedback, improve the final result’s quality. Our experiments show that our methods can outperform baselines, reaching F1-score up to 0.83.

Funder

FAPESP

CAPES

CNPq

Publisher

Springer Science and Business Media LLC

Subject

General Computer Science

Reference44 articles.

1. Bonifati A, Velegrakis Y (2011) Schema matching and mapping: from usage to evaluation In: Proceedings of the 14th International Conference on Extending Database Technology, 527–529.. Association for Computing Machinery, New York.

2. Do H-H, Rahm E (2002) COMA: a system for flexible combination of schema matching approaches In: Proceedings of the 28th International Conference on Very Large Data Bases, 610–621.. Morgan Kaufmann Publishers, San Francisco.

3. Madhavan J, Bernstein PA, Rahm E (2001) Generic schema matching with cupid In: Proceedings of the 27th International Conference on Very Large Data Bases, 49–58.. The VLDB Endowment, New York.

4. Doan A, Domingos P, Halevy AY (2001) Reconciling schemas of disparate data sources: a machine-learning approach In: Proceedings of the 2001 ACM SIGMOD International Conference on Management of Data, 509–520.. Association for Computing Machinery, New York.

5. Bernstein PA, Madhavan J, Rahm E (2011) Generic schema matching, ten years later. PVLDB 4(11):695–701.

Cited by 3 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Drag, Drop, Merge: A Tool for Streamlining Integration of Longitudinal Survey Instruments;Proceedings of the 2024 Workshop on Human-In-the-Loop Data Analytics;2024-06-14

2. Optimising Sustainability Accounting: Using Language Models to Match and Merge Survey Indicators;Lecture Notes in Business Information Processing;2024

3. IDAGEmb: An Incremental Data Alignment Based on Graph Embedding;Lecture Notes in Computer Science;2024

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3