Author:
Antunes Alex K.,Winter Eric,Vandegriff Jon Duane,Thomas Brian A.,Bradford Jeffrey W.
Abstract
Analysis of long timespan heliophysics and space physics data or application of machine learning algorithms can require access to petabyte-scale and larger data sets and sufficient computational capacity to process such “big data”. We provide a summary of Python support and performance statistics for the major scientific data formats under consideration for access to heliophysics data in cloud computing environments. The Heliophysics Data Portal lists 21 different formats used in heliophysics and space physics; our study focuses on Python support for the most-used formats of CDF, FITS, and NetCDF4/HDF. In terms of package support, there is no single Python package that supports all of the common heliophysics file types, while NetCDF/HDF5 is the most supported file type. In terms of technical implementation within a cloud environment, we profile file performance in Amazon Web Services (AWS). Effective use of AWS cloud-based storage requires Python libraries designed to read their S3 storage format. In Python, S3-aware libraries exist for CDF, FITS, and NetCDF4/HDF. The existing libraries use different approaches to handling cloud-based data, each with tradeoffs. With these caveats, Python pairs well with AWS’s cloud storage within the current Python ecosystem for existing heliophysics data, and cloud performance in Python is continually improving. We recommend anyone considering cloud use or optimization of data formats for cloud use specifically profile their given data set, as instrument-specific data characteristics have a strong effect on which approach is best for cloud use.
Funder
Goddard Space Flight Center
Subject
Astronomy and Astrophysics
Reference8 articles.
1. Snakes on a spaceship—an overview of Python in heliophysics;Burrell;J. Geophys. Res. Space Phys.,2018
2. Asdf: A new data format for astronomy;Greenfield;Astronomy Comput.,2015
3. A data model of the climate and forecast metadata conventions (CF-1.6) with a software implementation (Cf-Python v2.1);Hassell;Geosci. Model Dev.,2017
4. Cloud optimized data formats;Lynnes;Comm. Earth Observing Satell. Meet. #4,2020
5. The HelioCloud Project [cloud environment]2022
Cited by
4 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献