Affiliation:
1. Nanjing University and RWTH Aachen University, Ahornstrasse, Aachen, Germany
2. Nanjing University, Nanjing, Jiangsu, China
Abstract
Triple-structured open data creates value in many ways. However, the reuse of datasets is still challenging. Users feel difficult to assess the usefulness of a large dataset containing thousands or millions of triples. To satisfy the needs, existing abstractive methods produce a concise high-level abstraction of data. Complementary to that, we adopt the extractive strategy and aim to select the optimum small subset of data from a dataset as a snippet to compactly illustrate the content of the dataset. This has been formulated as a combinatorial optimization problem in our previous work. In this article, we design a new algorithm for the problem, which is an order of magnitude faster than the previous one but has the same approximation ratio. We also develop an anytime algorithm that can generate empirically better solutions using additional time. To suit datasets that are partially accessible via online query services (e.g., SPARQL endpoints for RDF data), we adapt our algorithms to trade off quality of snippet for feasibility and efficiency in the Web environment. We carry out extensive experiments based on real RDF datasets and SPARQL endpoints for evaluating quality and running time. The results demonstrate the effectiveness and practicality of our proposed algorithms.
Funder
NSFC
Qing Lan Program of Jiangsu Province
Publisher
Association for Computing Machinery (ACM)
Subject
Computer Networks and Communications
Reference38 articles.
1. Keith Alexander Richard Cyganiak Michael Hausenblas and Jun Zhao. 2011. Describing Linked Datasets with the VoID Vocabulary. W3C Note. Retrieved from https://www.w3.org/TR/void/. Keith Alexander Richard Cyganiak Michael Hausenblas and Jun Zhao. 2011. Describing Linked Datasets with the VoID Vocabulary. W3C Note. Retrieved from https://www.w3.org/TR/void/.
2. RDF Snippets for Semantic Web Search Engines
3. Latent topics in graph-structured data
Cited by
10 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. ACORDAR 2.0: A Test Collection for Ad Hoc Dataset Retrieval with Densely Pooled Datasets and Question-Style Queries;Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval;2024-07-10
2. Enhancing Dataset Search with Compact Data Snippets;Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval;2024-07-10
3. Dense Re-Ranking with Weak Supervision for RDF Dataset Search;The Semantic Web – ISWC 2023;2023
4. Dataset Search over Integrated Metadata from China’s Public Data Open Platforms;Big Data;2023
5. ACORDAR;Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval;2022-07-06