Affiliation:
1. University of Southampton
Abstract
Abstract
Population data is crucial for policy decisions, but fine-scale population numbers are often lacking due to the challenge of sharing sensitive data. Different approaches, such as the use of the Random Forest (RF) model, have been used to disaggregate census data from higher administrative units to small area scales. A major limitation of the RF model is its inability to quantify the uncertainties associated with the predicted populations, which can be important for policy decisions. In this study, we applied a Bayesian Additive Regression Tree (BART) model for population disaggregation and compared the result with a RF model using both simulated data and the 2021 census data for Ghana. The BART model consistently outperforms the RF model in out-of-sample predictions for all metrics, such as bias, mean squared error (MSE), and root mean squared error (RMSE). The BART model also addresses the limitations of the RF model by providing uncertainty estimates around the predicted population, which is often lacking with the RF model. Overall, the study demonstrates the superiority of the BART model over the RF model in disaggregating population data and highlights its potential for gridded population estimates.
Funder
Bill and Melinda Gates Foundation
Publisher
Research Square Platform LLC
Reference48 articles.
1. A pixel level evaluation of five multitemporal global gridded population datasets: a case study in Sweden, 1990–2015;Archila Bustos MF;Population and environment,2020
2. High-resolution population estimation using household survey data and building footprints;Boo G;Nature communications,2022
3. Random forests;Breiman L;Machine learning,2001
4. GHS-POP accuracy assessment: Poland and Portugal case study;Calka B;Remote Sensing,2020
5. Chipman, H. A., George, E. I., & McCulloch, R. E. (2010). BART: Bayesian additive regression trees.