Abstract
Machine learning (ML) has been utilized to predict climatic parameters, and many successes have been reported in the literature. In this paper, we scrutinize the effectiveness of five widely used ML algorithms in the monthly prediction of seasonal climatic parameters using monthly image data. Specifically, we quantify the predictive performance of these algorithms applied to five climatic parameters using various combinations of features. We compare the predictive accuracy of the resulting trained ML models to that of basic statistical estimators that are computed directly from the training data. Our results show that ML never significantly outperforms the statistical baseline, and underperforms for most feature sets. Unlike previous similar studies, we provide error bars for the relative performance of different predictors based on jackknife estimates applied to differences in predictive error magnitudes. We also show that the practice of shuffling data sequences which was employed in some previous references leads to data leakage, resulting in over-estimated performance. Ultimately, the paper demonstrates the importance of using well-grounded statistical techniques when producing and analyzing the results of ML predictive models.
Subject
Atmospheric Science,Environmental Science (miscellaneous)
Reference98 articles.
1. Data-driven geography
2. The Fourth Paradigm: Data-Intensive Scientific Discovery;Hey,2009
3. Big Data: The Next Frontier for Innovation, Competition, and Productivity;Manyika,2011
4. Big Data, new epistemologies and paradigm shifts
5. Deep learning and machine learning in hydrological processes climate change and earth systems a systematic review;Ardabili,2019
Cited by
8 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献