Managing large multidimensional hydrologic datasets: A case study comparing NetCDF and SciDB

Author:

Liu Haicheng1,van Oosterom Peter1,Tijssen Theo1,Commandeur Tom1,Wang Wen2

Affiliation:

1. Faculty of Architecture and the Built Environment, Delft University of Technology, Julianalaan 134, 2628 BL Delft, The Netherlands

2. State Key Laboratory of Hydrology-Water Resources and Hydraulic Engineering, Hohai University, Nanjing, 210098, China

Abstract

Abstract Management of large hydrologic datasets including storage, structuring, clustering, indexing, and query is one of the crucial challenges in the era of big data. This research originates from a specific problem: time series extraction at specific locations takes a long time when a large multidimensional (MD) dataset is stored in the NetCDF classic or the 64-bit offset format. The essence of this issue lies in the contiguous storage structure adopted by NetCDF. In this research, NetCDF file-based solutions and a MD array database management system applying a chunked storage structure are benchmarked to determine the best solution for storing and querying large MD hydrologic datasets. Expert consultancy was conducted to establish benchmark sets, with the HydroNET-4 system being utilized to provide the benchmark environment. In the final benchmark tests, the effect of data storage configurations, elaborating chunk size, dimension order (spatio-temporal clustering) and compression on the query performance, is explored. Results indicate that for big hydrologic MD data management, the properly chunked NetCDF-4 solution without compression is, in general, more efficient than the SciDB DBMS. However, benefits of a DBMS should not be neglected, for example, the integration with other data types, smart caching strategies, transaction support, scalability, and out-of-the-box support for parallelization.

Publisher

IWA Publishing

Subject

Atmospheric Science,Geotechnical Engineering and Engineering Geology,Civil and Structural Engineering,Water Science and Technology

Reference24 articles.

1. Decision support system for urban flood management;Journal of Hydroinformatics,2005

2. Ashrit R. , IyengarG. R., SankarS., AshishA., DubeA., DuttaS. K., PrasadV. S., RajagopalE. N. & BasuS.2013Performance of Global Ensemble Forecast System (GEFS) During Monsoon 2012. NCMRWF Research report, NMRF/RR/1. http://www.ncmrwf.gov.in/GEFS_Report_Final.pdf (accessed 14 December 2017).

3. The multidimensional database system RasDaMan,1998

4. Overview of SciDB: large scale array storage, processing and analysis,2010

5. Scientific formats for object-relational database systems: a study of suitability and performance;ACM SIGMOD Record,2006

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3