Abstract
Rapid industrialization and urban development are the main causes of air pollution, leading to daily air quality and health problems. To find significant pollutants and forecast their concentrations, in this study, we used a hybrid methodology, including integrated variable selection, autoregressive distributed lag, and deleted multiple collinear variables to reduce variables, and then applied six intelligent time series models to forecast the concentrations of the top three pollution sources. We collected two air quality datasets from traffic and industrial monitoring stations and weather data to analyze and compare their results. The results show that a random forest based on selected key variables has better classification metrics (accuracy, AUC, recall, precision, and F1). After deleting the collinearity of the independent variables and adding the lag periods using the autoregressive distributed lag model, the intelligent time-series support vector regression was found to have better forecasting performance (RMSE and MAE). Finally, the research results could be used as a reference by all relevant stakeholders and help respond to poor air quality.
Subject
Atmospheric Science,Environmental Science (miscellaneous)
Reference68 articles.
1. Global Energy & CO2 Status Report, The LATEST Trends in Energy and Emissions in 2018, Flagship Report
https://www.iea.org/reports/global-energy-co2-status-report-2019/emissions
2. Taiwan Air Quality Annual Report;TAQI,2018
3. Taiwan PM2.5. Main Pollution Sources of PM2.5 in Taiwan, Reported on 14 September 2018
https://www.fpg.com.tw/tw/issue/1/115
4. A European perspective on hazardous air pollutants
5. Air pollution and incidence of cancers of the stomach and the upper aerodigestive tract in the European Study of Cohorts for Air Pollution Effects (ESCAPE)