Affiliation:
1. Department of Computer Science, Stanford University, Stanford, CA
Abstract
When dealing with semistructured data such as that available on the Web, it becomes important to infer the inherent structure, both for the user (e.g., to facilitate querying) and for the system (e.g., to optimize access). In this paper, we consider the problem of identifying some underlying structure in large collections of semistructured data. Since we expect the data to be fairly irregular, this structure consists of an approximate classification of objects into a hierarchical collection of types. We propose a notion of a type hierarchy for such data, and outline a method for deriving the type hierarchy, and rules for assigning types to data elements.
Publisher
Association for Computing Machinery (ACM)
Subject
Information Systems,Software
Cited by
36 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. A survey on semantic schema discovery;The VLDB Journal;2021-11-27
2. Parametric schema inference for massive JSON datasets;The VLDB Journal;2019-01-05
3. Schema Discovery in RDF Data Sources;Conceptual Modeling;2015
4. Automating the formalization of product comparison matrices;Proceedings of the 29th ACM/IEEE international conference on Automated software engineering;2014-09-15
5. Composing JSON-Based Web APIs;Lecture Notes in Computer Science;2014