Parsimonious Random-Forest-Based Land-Use Regression Model Using Particulate Matter Sensors in Berlin, Germany

Author:

Venkatraman Jagatha Janani1ORCID,Schneider Christoph1ORCID,Sauter Tobias1ORCID

Affiliation:

1. Geography Department, Humboldt-Universität zu Berlin, Unter den Linden 6, 10099 Berlin, Germany

Abstract

Machine learning (ML) methods are widely used in particulate matter prediction modelling, especially through use of air quality sensor data. Despite their advantages, these methods’ black-box nature obscures the understanding of how a prediction has been made. Major issues with these types of models include the data quality and computational intensity. In this study, we employed feature selection methods using recursive feature elimination and global sensitivity analysis for a random-forest (RF)-based land-use regression model developed for the city of Berlin, Germany. Land-use-based predictors, including local climate zones, leaf area index, daily traffic volume, population density, building types, building heights, and street types were used to create a baseline RF model. Five additional models, three using recursive feature elimination method and two using a Sobol-based global sensitivity analysis (GSA), were implemented, and their performance was compared against that of the baseline RF model. The predictors that had a large effect on the prediction as determined using both the methods are discussed. Through feature elimination, the number of predictors were reduced from 220 in the baseline model to eight in the parsimonious models without sacrificing model performance. The model metrics were compared, which showed that the parsimonious_GSA-based model performs better than does the baseline model and reduces the mean absolute error (MAE) from 8.69 µg/m3 to 3.6 µg/m3 and the root mean squared error (RMSE) from 9.86 µg/m3 to 4.23 µg/m3 when applying the trained model to reference station data. The better performance of the GSA_parsimonious model is made possible by the curtailment of the uncertainties propagated through the model via the reduction of multicollinear and redundant predictors. The parsimonious model validated against reference stations was able to predict the PM2.5 concentrations with an MAE of less than 5 µg/m3 for 10 out of 12 locations. The GSA_parsimonious performed best in all model metrics and improved the R2 from 3% in the baseline model to 17%. However, the predictions exhibited a degree of uncertainty, making it unreliable for regional scale modelling. The GSA_parsimonious model can nevertheless be adapted to local scales to highlight the land-use parameters that are indicative of PM2.5 concentrations in Berlin. Overall, population density, leaf area index, and traffic volume are the major predictors of PM2.5, while building type and local climate zones are the less significant predictors. Feature selection based on sensitivity analysis has a large impact on the model performance. Optimising models through sensitivity analysis can enhance the interpretability of the model dynamics and potentially reduce computational costs and time when modelling is performed for larger areas.

Funder

Federal Ministry of Education and Research, Germany

Publisher

MDPI AG

Reference132 articles.

1. Welsch, J., Bömermann, H., and Nagel, H. (2011). Data Sources of the Berlin Pilot Project: The Berlin Environmental Atlas and Social Urban Development Monitoring. UMID, Federal Environment Agency.

2. Social indicators are predictors of airborne outdoor exposures in Berlin;Franck;Ecol. Indic.,2014

3. World Population Review (2024). Europe Cities by Population 2024, World Population Review. Available online: https://worldpopulationreview.com/continents/europe/cities.

4. Die Waermezonen der Erde, nach der Dauer der heissen, gemaessigten und kalten Zeit und nach der Wirkung der Waerme auf die organische Welt betrachtet (The thermal zones of the earth according to the duration of hot, moderate and cold periods and to the impact of heat on the organic world);Koeppen;Meteorol. Z.,2011

5. Endlicher, W. (2011). Particulate Matter in the Urban Atmosphere: Concentration, Distribution, Reduction – Results of Studies in the Berlin Metropolitan Area. Perspectives in Urban Ecology, Springer. Research Programme Urban Ecology Berlin.

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3