Challenges and Benchmark Datasets for Machine Learning in the Atmospheric Sciences: Definition, Status, and Outlook-Reference-Cited by-同舟云学术

Challenges and Benchmark Datasets for Machine Learning in the Atmospheric Sciences: Definition, Status, and Outlook

Published:2022-07 Issue:3 Volume:1 Page:
ISSN:2769-7525
Container-title:Artificial Intelligence for the Earth Systems
language:
Short-container-title:

Author:

Dueben Peter D.¹,Schultz Martin G.²,Chantry Matthew¹,Gagne David John³,Hall David Matthew⁴,McGovern Amy⁵

Affiliation:

1. a European Centre for Medium-Range Weather Forecasts, Reading, United Kingdom

2. b Jülich Supercomputing Centre, Forschungszentrum Jülich, Jülich, Germany

3. c National Center for Atmospheric Research, Boulder, Colorado

4. d NVIDIA Corporation, Santa Clara, California

5. e University of Oklahoma, School of Computer Science and School of Meteorology, Norman, Oklahoma

Abstract

Abstract Benchmark datasets and benchmark problems have been a key aspect for the success of modern machine learning applications in many scientific domains. Consequently, an active discussion about benchmarks for applications of machine learning has also started in the atmospheric sciences. Such benchmarks allow for the comparison of machine learning tools and approaches in a quantitative way and enable a separation of concerns for domain and machine learning scientists. However, a clear definition of benchmark datasets for weather and climate applications is missing with the result that many domain scientists are confused. In this paper, we equip the domain of atmospheric sciences with a recipe for how to build proper benchmark datasets, a (nonexclusive) list of domain-specific challenges for machine learning is presented, and it is elaborated where and what benchmark datasets will be needed to tackle these challenges. We hope that the creation of benchmark datasets will help the machine learning efforts in atmospheric sciences to be more coherent, and, at the same time, target the efforts of machine learning scientists and experts of high-performance computing to the most imminent challenges in atmospheric sciences. We focus on benchmarks for atmospheric sciences (weather, climate, and air-quality applications). However, many aspects of this paper will also hold for other aspects of the Earth system sciences or are at least transferable. Significance Statement Machine learning is the study of computer algorithms that learn automatically from data. Atmospheric sciences have started to explore sophisticated machine learning techniques and the community is making rapid progress on the uptake of new methods for a large number of application areas. This paper provides a clear definition of so-called benchmark datasets for weather and climate applications that help to share data and machine learning solutions between research groups to reduce time spent in data processing, to generate synergies between groups, and to make tool developments more targeted and comparable. Furthermore, a list of benchmark datasets that will be needed to tackle important challenges for the use of machine learning in atmospheric sciences is provided.

Publisher

American Meteorological Society

Link

https://journals.ametsoc.org/downloadpdf/journals/aies/1/3/AIES-D-21-0002.1.xml

Reference56 articles.

1. TensorFlow: Large-scale machine learning on heterogeneous systems;Abadi, M.,2015

2. Statistical approaches to assimilate ASCAT soil moisture information—I. Methodologies and first assessment;Aires, F.,2021

3. Integrating machine learning and multiscale modeling-perspectives, challenges, and opportunities in the biological, biomedical, and behavioral sciences;Alber, M.,2019

4. The characteristics of United States hail reports: 1955–2014;Allen, J. T.,2015

5. An extreme value model for U.S. hail size;Allen, J. T.,2017

Cited by 17 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Artificial intelligence for climate prediction of extremes: State of the art, challenges, and future perspectives;WIREs Climate Change;2024-09-03

2. Towards practical artificial intelligence in Earth sciences;Computational Geosciences;2024-09-02

3. Efficient and stable coupling of the SuperdropNet deep-learning-based cloud microphysics (v0.1.0) with the ICON climate and weather model (v2.6.5);Geoscientific Model Development;2024-05-16

4. Physics-based and data-driven hybrid modeling in manufacturing: a review;Production & Manufacturing Research;2024-01-18

5. Novel Dataset Creation of Varieties of Banana and Ripening Stages for Machine Learning Applications;Communications in Computer and Information Science;2024