Abstract
Ontology-based data management and knowledge graphs have emerged in recent years as efficient approaches for managing and utilizing diverse and large data sets. In this regard, research on algorithms for automatic semantic labeling and modeling as a prerequisite for both has made steady progress in the form of new approaches. The range of algorithms varies in the type of information used (data schema, values, or metadata), as well as in the underlying methodology (e.g., use of different machine learning methods or external knowledge bases). Approaches that have been established over the years, however, still come with various weaknesses. Most approaches are evaluated on few small data corpora specific to the approach. This reduces comparability and also limits statements for the general applicability and performance of those approaches. Other research areas, such as computer vision or natural language processing solve this problem by providing unified data corpora for the evaluation of specific algorithms and tasks. In this paper, we present and publish VC-SLAM to lay the necessary foundation for future research. This corpus allows the evaluation and comparison of semantic labeling and modeling approaches across different methodologies, and it is the first corpus that additionally allows to leverage textual data documentations for semantic labeling and modeling. Each of the contained 101 data sets consists of labels, data and metadata, as well as corresponding semantic labels and a semantic model that were manually created by human experts using an ontology that was explicitly built for the corpus. We provide statistical information about the corpus as well as a critical discussion of its strengths and shortcomings, and test the corpus with existing methods for labeling and modeling.
Funder
Ministerium für Wirtschaft, Innovation, Digitalisierung und Energie des Landes Nordrhein-Westfalen
University of Wuppertal
Subject
Information Systems and Management,Computer Science Applications,Information Systems
Reference27 articles.
1. Towards NLP-supported Semantic Data Management;Burgdorf;arXiv,2020
Cited by
6 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. A survey on semantic data management as intersection of ontology-based data access, semantic modeling and data lakes;Journal of Web Semantics;2024-07
2. The PLASMA Framework: Laying the Path to Domain-Specific Semantics in Dataspaces;Companion Proceedings of the ACM Web Conference 2023;2023-04-30
3. Collaborative Filtering Recommender System for Semantic Model Refinement;2023 IEEE 17th International Conference on Semantic Computing (ICSC);2023-02
4. DocSemMap 2.0;Proceedings of the 31st ACM International Conference on Information & Knowledge Management;2022-10-17
5. Using Node Embeddings to Generate Recommendations for Semantic Model Creation;Proceedings of the 24th International Conference on Enterprise Information Systems;2022