Efficient Querying Distributed Big-XML Data using MapReduce-Reference-Cited by-同舟云学术

Efficient Querying Distributed Big-XML Data using MapReduce

Published:2016-07 Issue:3 Volume:8 Page:70-79
ISSN:1938-0259
Container-title:International Journal of Grid and High Performance Computing
language:en
Short-container-title:

Author:

Kunfang Song¹,Lu Hongwei¹

Affiliation:

1. Huazhong University of Science and Technology, Wuhan, China

Abstract

MapReduce is a widely adopted computing framework for data-intensive applications running on clusters. This paper proposed an approach to exploit data parallelisms in XML processing using MapReduce in Hadoop. The authors' solution seamlessly integrates data storage, labeling, indexing, and parallel queries to process a massive amount of XML data. Specifically, the authors introduce an SDN labeling algorithm and a distributed hierarchical index using DHTs. More importantly, an advanced two-phase MapReduce solution are designed that is able to efficiently address the issues of labeling, indexing, and query processing on big XML data. The experimental results show the efficiency and effectiveness of the proposed parallel XML data approach using Hadoop.

Publisher

IGI Global

Subject

Computer Networks and Communications

Cited by 6 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Integrated method for distributed processing of large XML data;Cluster Computing;2023-05-13

2. Efficient processing of complex XSD using Hive and Spark;PeerJ Computer Science;2021-08-17

3. A data locality based scheduler to enhance MapReduce performance in heterogeneous environments;Future Generation Computer Systems;2019-01

4. Efficient Storage and Parallel Query of Massive XML Data in Hadoop;Advances in Data Mining and Database Management;2019

5. Unsupervised blocking and probabilistic parallelisation for record matching of distributed big data;The Journal of Supercomputing;2017-03-16