Affiliation:
1. School of Computer, Data and Mathematical Sciences, Western Sydney University, Locked Bag 1797, Penrith, NSW 2751, Australia
Abstract
Air quality data sets are widely used in numerous analyses. Missing values are ubiquitous in air quality data sets as the data are collected through sensors. Recovery of missing data is a challenging task in the data preprocessing stage. This task becomes more challenging in time series data as time is an implicit variable that cannot be ignored. Even though existing methods to deal with missing data in time series perform well in situations where the percentage of missing values is relatively low and the gap size is small, their performances are reasonably lower when it comes to large gaps. This paper presents a novel algorithm based on seasonal decomposition and elastic net regression to impute large gaps of time series data when there exist correlated variables. This method outperforms several other existing univariate approaches namely Kalman smoothing on ARIMA models, Kalman smoothing on structural time series models, linear interpolation, and mean imputation in imputing large gaps. However, this is applicable only when there exists one or more correlated variables with the time series with large gaps.
Subject
Atmospheric Science,Environmental Science (miscellaneous)
Reference49 articles.
1. Kalivitis, N., Papatheodorou, S., Maesano, C.N., and Annesi-Maesano, I. (2022). Atmospheric Chemistry in the Mediterranean Region, Springer.
2. Temperature, air pollution and total mortality during summers in Sydney, 1994–2004;Hu;Int. J. Biometeorol.,2008
3. Does particulate matter modify the association between temperature and cardiorespiratory diseases?;Ren;Environ. Health Perspect.,2006
4. The short-term effects of air pollution on daily mortality in four Australian cities;Simpson;Aust. N. Z. J. Public Health,2005
5. Inference and missing data;Rubin;Biometrika,1976
Cited by
3 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献