DGFIndex for smart grid-Reference-Cited by-同舟云学术

DGFIndex for smart grid

Published:2014-08 Issue:13 Volume:7 Page:1496-1507
ISSN:2150-8097
Container-title:Proceedings of the VLDB Endowment
language:en
Short-container-title:Proc. VLDB Endow.

Author:

Liu Yue¹,Hu Songlin²,Rabl Tilmann³,Liu Wantao²,Jacobsen Hans-Arno³,Wu Kaifeng⁴,Chen Jian⁵,Li Jintao²

Affiliation:

1. Chinese Academy of Sciences, China and University of Chinese Academy of Sciences, China

2. Chinese Academy of Sciences, China

3. Middleware Systems Research Group University of Toronto, Canada

4. State Grid Electricity Science Research Institute, China

5. Zhejiang Electric Power Corporation, China

Abstract

In Smart Grid applications, as the number of deployed electric smart meters increases, massive amounts of valuable meter data is generated and collected every day. To enable reliable data collection and make business decisions fast, high throughput storage and high-performance analysis of massive meter data become crucial for grid companies. Considering the advantage of high efficiency, fault tolerance, and price-performance of Hadoop and Hive systems, they are frequently deployed as underlying platform for big data processing. However, in real business use cases, these data analysis applications typically involve multidimensional range queries (MDRQ) as well as batch reading and statistics on the meter data. While Hive is high-performance at complex data batch reading and analysis, it lacks efficient indexing techniques for MDRQ. In this paper, we propose DGFIndex, an index structure for Hive that efficiently supports MDRQ for massive meter data. DGFIndex divides the data space into cubes using the grid file technique. Unlike the existing indexes in Hive, which stores all combinations of multiple dimensions, DGFIndex only stores the information of cubes. This leads to smaller index size and faster query processing. Furthermore, with pre-computing user-defined aggregations of each cube, DGFIndex only needs to access the boundary region for aggregation query. Our comprehensive experiments show that DGFIndex can save significant disk space in comparison with the existing indexes in Hive and the query performance with DGFIndex is 2-50 times faster than existing indexes in Hive and HadoopDB for aggregation query, 2-5 times faster than both for non-aggregation query, 2-75 times faster than scanning the whole table in different query selectivity.

Publisher

VLDB Endowment

Subject

General Earth and Planetary Sciences,Water Science and Technology,Geography, Planning and Development

Link

https://dl.acm.org/doi/pdf/10.14778/2733004.2733021

Cited by 14 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Review of Smart Grid Failure Prediction and the Need for its Study in STEM Careers;Lecture Notes in Educational Technology;2023

2. Constructing Compact Time Series Index for Efficient Window Query Processing;2022 IEEE 38th International Conference on Data Engineering (ICDE);2022-05

3. Design of Management Platform Architecture and Key Algorithm for Massive Monitoring Big Data;Wireless Communications and Mobile Computing;2021-09-28

4. An efficient parallel indexing structure for multi-dimensional big data using spark;The Journal of Supercomputing;2021-03-22

5. Batch-file Operations to Optimize Massive Files Accessing;ACM Transactions on Storage;2020-08-14