Abstract
With rising environmental concerns, accurate air quality predictions have become paramount as they help in planning preventive measures and policies for potential health hazards and environmental problems caused by poor air quality. Most of the time, air quality data are time series data. However, due to various reasons, we often encounter missing values in datasets collected during data preparation and aggregation steps. The inability to analyze and handle missing data will significantly hinder the data analysis process. To address this issue, this paper offers an extensive review of air quality prediction and missing data imputation techniques for time series, particularly in relation to environmental challenges. In addition, we empirically assess eight imputation methods, including mean, median, kNNI, MICE, SAITS, BRITS, MRNN, and Transformer, to scrutinize their impact on air quality data. The evaluation is conducted using diverse air quality datasets gathered from numerous cities globally. Based on these evaluations, we offer practical recommendations for practitioners dealing with missing data in time series scenarios for environmental data.
Funder
Vietnam National University Ho Chi Minh City
Publisher
Public Library of Science (PLoS)
Reference87 articles.
1. Duong DQ, Le QM, Nguyen-Tai TL, Nguyen HD, Dao MS, Nguyen BT. An effective AQI estimation using sensor data and stacking mechanism. In: Proceedings of the 20th International Conference on New Trends in Intelligent Software Methodologies, Tools and Techniques (SoMeT 21). vol. 337. IOS Press; 2021. p. 405–418.
2. Vu MA, Nguyen T, Do TT, Phan N, Halvorsen P, Riegler MA, et al. Conditional expectation for missing data imputation. arXiv preprint arXiv:230200911. 2023;.
3. Shaadan N, Rahim N. Imputation analysis for time series air quality (PM10) data set: A comparison of several methods. In: Journal of Physics: Conference Series. vol. 1366. IOP Publishing; 2019. p. 012107.
4. KTFEv2: Multimodal Facial Emotion Database and its Analysis;H Nguyen;IEEE Access,2023
5. A practical comparison of single and multiple imputation methods to handle complex missing data in air quality datasets;MP Gómez-Carracedo;Chemometrics and Intelligent Laboratory Systems,2014