Affiliation:
1. Department of Artificial Intelligence, Universidad Politécnica de Madrid, 28223, Madrid, Spain. E-mails: aalobaid@fi.upm.es, ocorcho@fi.upm.es
2. Department of Electronics and Computer Science, University of Southampton, UK. E-mail: e.kacprzak@soton.ac.uk
Abstract
A lot of tabular data are being published on the Web. Semantic labeling of such data may help in their understanding and exploitation. However, many challenges need to be addressed to do this automatically. With numbers, it can be even harder due to the possible difference in measurement accuracy, rounding errors, and even the frequency of their appearance. Multiple approaches have been proposed in the literature to tackle the problem of semantic labeling of numeric values in existing tabular datasets. However, they also suffer from several shortcomings: closely coupled with entity-linking, rely on table context, need to profile the knowledge graph, and require manual training of the model. Above all, however, they all treat different types of numeric values evenly. In this paper, we tackle these problems and validate our hypothesis: whether taking into account the typology of numeric data in semantic labeling yields better results.
Subject
Computer Networks and Communications,Computer Science Applications,Information Systems
Cited by
8 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Linear approximation of the quantile–quantile plot for semantic labelling of numeric columns in tabular data;Expert Systems with Applications;2024-03
2. Dataset Discovery and Exploration: A Survey;ACM Computing Surveys;2023-11-09
3. Product Entity Matching via Tabular Data;Proceedings of the 32nd ACM International Conference on Information and Knowledge Management;2023-10-21
4. SAND: Semantic Annotation of Numeric Data in Web Tables;Proceedings of the 32nd ACM International Conference on Information and Knowledge Management;2023-10-21
5. Tab2KG: Semantic table interpretation with lightweight semantic profiles;Semantic Web;2022-04-06