Development of a regional feature selection-based machine learning system (RFSML v1.0) for air pollution forecasting over China
-
Published:2022-10-24
Issue:20
Volume:15
Page:7791-7807
-
ISSN:1991-9603
-
Container-title:Geoscientific Model Development
-
language:en
-
Short-container-title:Geosci. Model Dev.
Author:
Fang Li, Jin JianbingORCID, Segers ArjoORCID, Lin Hai XiangORCID, Pang MijieORCID, Xiao Cong, Deng Tuo, Liao Hong
Abstract
Abstract. With the explosive growth of atmospheric data, machine learning models have achieved great success in air pollution forecasting because of their higher computational efficiency than the traditional chemical transport models. However, in previous studies, new prediction algorithms have only been tested at stations or in a small region; a large-scale air quality forecasting model remains lacking to date. Huge dimensionality also means that redundant input data may lead to increased complexity and therefore the over-fitting of machine learning models. Feature selection is a key topic in machine learning development, but it has not yet been explored in atmosphere-related applications. In this work, a regional feature selection-based machine learning (RFSML) system was developed, which is capable of predicting air quality in the short term with high accuracy at the national scale. Ensemble-Shapley additive global importance analysis is combined with the RFSML system to extract significant regional features and eliminate redundant variables at an affordable computational expense. The significance of the regional features is also explained physically. Compared with a standard machine learning system fed with relative features, the RFSML system driven by the selected key features results in superior interpretability, less training time, and more accurate predictions. This study also provides insights into the difference in interpretability among machine learning models (i.e., random forest, gradient boosting, and multi-layer perceptron models).
Funder
National Natural Science Foundation of China Natural Science Foundation of Jiangsu Province
Publisher
Copernicus GmbH
Reference88 articles.
1. Abu Awad, Y., Koutrakis, P., Coull, B. A., and Schwartz, J.: A spatio-temporal prediction model based on support vector machine regression: Ambient Black Carbon in three New England States, Environ. Res., 159, 427–434, https://doi.org/10.1016/j.envres.2017.08.039, 2017. a 2. Altmann, A., Toloşi, L., Sander, O., and Lengauer, T.: Permutation importance: a corrected feature importance measure, Bioinformatics, 26, 1340–1347, https://doi.org/10.1093/bioinformatics/btq134, 2010. a 3. Bai, Y., Li, Y., Zeng, B., Li, C., and Zhang, J.: Hourly PM2.5 concentration forecast using stacked autoencoder model with emphasis on seasonality, J. Clean. Prod., 224, 739–750, 2019. a 4. Bartier, P. M. and Keller, C.: Multivariate interpolation to incorporate thematic surface data using inverse distance weighting (IDW), Comput. Geosci., 22, 795–799, https://doi.org/10.1016/0098-3004(96)00021-0, 1996. a 5. Bey, I., Jacob, D. J., Yantosca, R. M., Logan, J. A., Field, B. D., Fiore, A. M., Li, Q., Liu, H. Y., Mickley, L. J., and Schultz, M. G.: Global modeling of tropospheric chemistry with assimilated meteorology: Model description and evaluation, J. Geophys. Res.-Atmos., 106, 23073–23095, https://doi.org/10.1029/2001JD000807, 2001. a
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
|
|