Affiliation:
1. FORTH-ICS, Greece & University of Crete, Greece
2. FORTH-ICS, Greece
Abstract
In many applications one has to fetch and assemble pieces of information coming from more than one source for building a semantic warehouse offering more advanced query capabilities. In this paper the authors describe the corresponding requirements and challenges, and they focus on the aspects of quality and value of the warehouse. For this reason they introduce various metrics (or measures) for quantifying its connectivity, and consequently its ability to answer complex queries. The authors demonstrate the behaviour of these metrics in the context of a real and operational semantic warehouse, as well as on synthetically produced warehouses. The proposed metrics allow someone to get an overview of the contribution (to the warehouse) of each source and to quantify the value of the entire warehouse. Consequently, these metrics can be used for advancing data/endpoint profiling and for this reason the authors use an extension of VoID (for making them publishable). Such descriptions can be exploited for dataset/endpoint selection in the context of federated search. In addition, the authors show how the metrics can be used for monitoring a semantic warehouse after each reconstruction reducing thereby the cost of quality checking, as well as for understanding its evolution over time.
Reference47 articles.
1. LODStats – An Extensible Framework for High-Performance Dataset Analytics
2. Enhancing data quality in data warehouse environments
3. Bizer, C. (n. d.). Quality-Driven Information Filtering in the Context of Web-Based Information Systems. Berlin: Freie Universität.
4. DBpedia - A crystallization point for the Web of Data
5. Candela, L., Castelli, D., & Pagano, P. (2010). Making Virtual Research Environments in the Cloud a Reality: the gCube Approach. ERCIM News, 2010 (83), p. 32.