Array databases: concepts, standards, implementations-Reference-Cited by-同舟云学术

Array databases: concepts, standards, implementations

Published:2021-02-02 Issue:1 Volume:8 Page:
ISSN:2196-1115
Container-title:Journal of Big Data
language:en
Short-container-title:J Big Data

Author:

Baumann Peter^ORCID,Misev Dimitar,Merticariu Vlad,Huu Bang Pham

Abstract

AbstractMulti-dimensional arrays (also known as raster data or gridded data) play a key role in many, if not all science and engineering domains where they typically represent spatio-temporal sensor, image, simulation output, or statistics “datacubes”. As classic database technology does not support arrays adequately, such data today are maintained mostly in silo solutions, with architectures that tend to erode and not keep up with the increasing requirements on performance and service quality. Array Database systems attempt to close this gap by providing declarative query support for flexible ad-hoc analytics on large n-D arrays, similar to what SQL offers on set-oriented data, XQuery on hierarchical data, and SPARQL and CIPHER on graph data. Today, Petascale Array Database installations exist, employing massive parallelism and distributed processing. Hence, questions arise about technology and standards available, usability, and overall maturity. Several papers have compared models and formalisms, and benchmarks have been undertaken as well, typically comparing two systems against each other. While each of these represent valuable research to the best of our knowledge there is no comprehensive survey combining model, query language, architecture, and practical usability, and performance aspects. The size of this comparison differentiates our study as well with 19 systems compared, four benchmarked to an extent and depth clearly exceeding previous papers in the field; for example, subsetting tests were designed in a way that systems cannot be tuned to specifically these queries. It is hoped that this gives a representative overview to all who want to immerse into the field as well as a clear guidance to those who need to choose the best suited datacube tool for their application. This article presents results of the Research Data Alliance (RDA) Array Database Assessment Working Group (ADA:WG), a subgroup of the Big Data Interest Group. It has elicited the state of the art in Array Databases, technically supported by IEEE GRSS and CODATA Germany, to answer the question: how can data scientists and engineers benefit from Array Database technology? As it turns out, Array Databases can offer significant advantages in terms of flexibility, functionality, extensibility, as well as performance and scalability—in total, the database approach of offering “datacubes” analysis-ready heralds a new level of service quality. Investigation shows that there is a lively ecosystem of technology with increasing uptake, and proven array analytics standards are in place. Consequently, such approaches have to be considered a serious option for datacube services in science, engineering and beyond. Tools, though, vary greatly in functionality and performance as it turns out.

Funder

Projekt DEAL

Publisher

Springer Science and Business Media LLC

Subject

Information Systems and Management,Computer Networks and Communications,Hardware and Architecture,Information Systems

Link

http://link.springer.com/content/pdf/10.1186/s40537-020-00399-2.pdf

Reference126 articles.

1. Abadi M, et al. Tensorflow: A system for large-scale machine learning. 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI). 2016. p. 265–283.

2. Abadi D. On Big Data, analytics and hadoop. ODBMS Industry Watch. 2012. http://www.odbms.org/blog/2012/12/on-big-data-analytics-and-hadoop-interview-with-daniel-abadi/. Accessed 23 Aug 2020.

3. Abadi M. TensorFlow: Learning functions at scale. Proc. ACM SIGPLAN Intl. Conference on Functional Programming. St Petersburg, USA, 2016.

4. Andrejev A, Baumann P, Misev D, Risch T. Spatio-temporal gridded data processing on the semantic web. Proc. Intl. Conf. on Data Science and Data Intensive Systems (DSDIS). Sydney, Australia, 2015.

5. Baumann P. A database array algebra for spatio-temporal data and beyond. Proc. Intl. Workshop on Next Generation Information Technologies and Systems (NGITS). Zikhron Yaakov, Israel. Springer LNCS 1649. 1999.

Cited by 28 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Datacubes as enabler for advanced decision support systems in land management;Land Degradation & Development;2024-05-26

2. A geospatial decision support system to support policy implementation on climate change in EU;Land Degradation & Development;2024-01-18

3. Supporting the planning and management of biodiversity through the development of a geospatial decision support system;Land Degradation & Development;2024-01-18

4. Soil Science in Italy from 2000 to 2024;Soil Science in Italy;2024

5. Tools and Databases in Transcriptomics Analysis: Recent Knowledge and Advancements;Reference Module in Life Sciences;2024