Evaluation of multivariate time series clustering for imputation of air pollution data
-
Published:2021-11-03
Issue:2
Volume:10
Page:265-285
-
ISSN:2193-0864
-
Container-title:Geoscientific Instrumentation, Methods and Data Systems
-
language:en
-
Short-container-title:Geosci. Instrum. Method. Data Syst.
Author:
Alahamade WedadORCID, Lake Iain, Reeves Claire E.ORCID, De La Iglesia Beatriz
Abstract
Abstract. Air pollution is one of the world's leading risk factors for death, with 6.5 million deaths per year worldwide attributed to air-pollution-related diseases. Understanding the behaviour of certain pollutants through air quality assessment can produce improvements in air quality management that will translate to health and economic benefits. However, problems with missing data and uncertainty hinder that assessment. We are motivated by the need to enhance the air pollution data available. We focus on the problem of missing air pollutant concentration data either because a limited set of pollutants is measured at a monitoring site or because an instrument is not operating, so a particular pollutant is not measured for a period of time. In our previous work, we have proposed models which can impute a whole missing time series to enhance air quality monitoring. Some of these models are based on a multivariate time series (MVTS) clustering method. Here, we apply our method to real data and show how different graphical and statistical model evaluation functions enable us to select the imputation model that produces the most plausible imputations. We then compare the Daily Air Quality Index (DAQI) values obtained after imputation with observed values incorporating missing data. Our results show that using an ensemble model that aggregates the spatial similarity obtained by the geographical correlation between monitoring stations and the fused temporal similarity between pollutant concentrations produces very good imputation results. Furthermore, the analysis enhances understanding of the different pollutant behaviours and of the characteristics of different stations according to their environmental type.
Publisher
Copernicus GmbH
Subject
Atmospheric Science,Geology,Oceanography
Reference29 articles.
1. Alahamade, W.: Wedad-O-A/Modelled-concentrations-: Modelled_Concentration_Air_Qaulity (v3.5.2), Zenodo [code and data set], https://doi.org/10.5281/zenodo.5602618, 2021. a 2. Alahamade, W., Lake, I., Reeves, C. E., and De La Iglesia, B.: Clustering
Imputation for Air Pollution Data, in: International Conference on Hybrid
Artificial Intelligence Systems, Lecture Notes in Computer Science, 585–597, https://doi.org/10.1007/978-3-030-61705-9_48, Springer, Cham, 2020. a, b 3. Alahamade, W., Lake, I., Reeves, C. E., and De La Iglesia, B.: A Multi-variate
Time Series clustering approach based on Intermediate Fusion A case study in
air pollution data imputation, Neurocomputing, in press, 2021. a, b, c 4. Austin, E., Coull, B. A., Zanobetti, A., and Koutrakis, P.: A framework to
spatially cluster air pollution monitoring sites in US based on the PM2.5
composition, Environ. Int., 59, 244–254, 2013. a, b 5. Carbajal-Hernández, J. J., Sánchez-Fernández, L. P.,
Carrasco-Ochoa, J. A., and Martínez-Trinidad, J. F.: Assessment and
prediction of air quality using fuzzy logic and autoregressive models,
Atmos. Environ., 60, 37–50, 2012. a
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
|
|