On evaluating an approach for balancing the trade‐off on XML schema design
Author:
Schroeder Rebeca,Duarte Denio,dos Santos Mello Ronaldo
Abstract
PurposeDesigning efficient XML schemas is essential for XML applications which manage semi‐structured data. On generating XML schemas, there are two opposite goals: to avoid redundancy and to provide connected structures in order to achieve good performance on queries. In general, highly connected XML structures allow data redundancy, and redundancy‐free schemas generate disconnected XML structures. The purpose of this paper is to describe and evaluate by experiments an approach which balances such trade‐off through a workload analysis. Additionally, it aims to identify the most accessed data based on the workload and suggest indexes to improve access performance.Design/methodology/approachThe paper applies and evaluates a workload‐aware methodology to provide indexing and highly connected structures for data which are intensively accessed through paths traversed by the workload.FindingsThe paper presents benchmarking results on a set of design approaches for XML schemas and demonstrates that the XML schemas generated by the approach provide high query performance and low cost of data redundancy on balancing the trade‐off on XML schema design.Research limitations/implicationsAlthough an XML benchmark is applied in these experiments, further experiments are expected in a real‐world application.Practical implicationsThe approach proposed may be applied in a real‐world process for designing new XML databases as well as in reverse engineering process to improve XML schemas from legacy databases.Originality/valueUnlike related work, the reported approach integrates the two opposite goal in the XML schema design, and generates suitable schemas according to a workload. An experimental evaluation shows that the proposed methodology is promising.
Subject
Computer Networks and Communications,Information Systems
Reference27 articles.
1. Arenas, M. and Libkin, L. (2002), “A normal form for XML documents”, Proceedings of the Twenty‐first ACM SIGMOD‐SIGACT‐SIGART (PODS '02), ACM, New York, NY, pp. 85‐96. 2. Barbosa, D., Mendelzon, A., Keenleyside, J. and Lyons, K. (2002), “ToXgene: a template‐based data generator for XML”, Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data (SIGMOD '02), ACM, New York, NY, pp. 616‐21. 3. Batini, C., Ceri, S. and Navathe, S. (1992), Conceptual Database Design: An Entity‐relationship Approach, Benjamin Cummings Publishing Company, Redwood City, CA. 4. Bradford, T., Gritsenko, V. and O'Neill, K. (2011), “Apache xindice”, available at: http://xml.apache.org/xindice/. 5. Brantner, M. (2009), “Sausalito: an application servers for RESTful services in the cloud”, Proceedings of the 13th East European Conference on Advances in Databases and Information Systems, Springer, Berlin.
|
|