Affiliation:
1. Univ. of Dortmund, Dortmund, W. Germany
2. Politecnico di Milano, Milan, Italy
3. IBM Almaden Research Center, San Jose, CA
Abstract
We describe a data model for structured office information objects, which we generically call “documents,” and a practically useful algebraic language for the retrieval and manipulation of such objects. Documents are viewed as hierarchical structures; their layout (presentation) aspect is to be treated separately. The syntax and semantics of the language are defined precisely in terms of the formal model, an extended relational algebra.
The proposed approach has several new features, some of which are particularly useful for the management of office information. The data model is based on nested sequences of tuples rather than nested relations. Therefore, sorting and sequence operations and the explicit handling of duplicates can be described by the model. Furthermore, this is the first model based on a many-sorted instead of a one-sorted algebra, which means that atomic data values as well as nested structures are objects of the algebra. As a consequence, arithmetic operations, aggregate functions, and so forth can be treated inside the model and need not be introduced as query language extensions to the model. Many-sorted algebra also allows arbitrary algebra expressions (with Boolean result) to be admitted as selection or join conditions and the results of arbitrary expressions to be embedded into tuples. In contrast to other formal models, this algebra can be used directly as a rich query language for office documents with precisely defined semantics.
Publisher
Association for Computing Machinery (ACM)
Subject
Computer Science Applications,General Business, Management and Accounting,Information Systems
Cited by
38 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献