Author:
Wobcke Wayne,Mariyah Siti
Abstract
Recent years have seen increased interest in the use of alternative data sources in the definition and production of official statistics and indicators for the UN Sustainable Development Goals. In this paper, we consider the application of data science to the production of official statistics, illustrating our perspective through the use of poverty targeting as an application. We show that machine learning can play a central role in the generation of official statistics, combining a variety of types of data (survey, administrative and alternative). We focus on the problem of poverty targeting using the Proxy Means Test in Indonesia, comparing a number of existing statistical and machine learning methods, then introducing new approaches in the spirit of small area estimation that utilize area-level features and data augmentation at the subdistrict level to develop more refined models at the district level, evaluating the methods on three districts in Indonesia on the problem of estimating 2020 per capita household expenditure using data from 2016–2019. The best performing method, XGBoost, is able to reduce inclusion/exclusion errors on the problem of identifying the poorest 40% of the population in comparison to the commonly used Ridge Regression method by between 4.5% and 13.9% in the districts studied.
Subject
Statistics, Probability and Uncertainty,Economics and Econometrics,Management Information Systems
Reference53 articles.
1. Florescu D, Karlberg M, Reis F, Del Castillo PR, Skaliotis M, Wirthmann A. Will ‘Big Data’ Transform Official Statistics? 2014. Available at: https://www.q2014.at/fileadmin/user_upload/ESTAT-Q2014-BigDataOS-v1a.pdf.
2. Landefeld S. Uses of Big Data for Official Statistics: Privacy, Incentives, Statistical Challenges, and Other Issues. Presented at the International Conference on Big Data for Official Statistics. Beijing: 2014 Oct.
3. Tam SM, Clarke F. Big Data, Official Statistics and Some Initiatives by the Australian Bureau of Statistics. Presented at the International Conference on Big Data for Official Statistics. Beijing: 2014 Oct.
4. The Opportunities, Challenges and Risks of Big Data for Official Statistics;Kitchin;Statistical Journal of the IAOS,2015
5. Official Statistics and Big Data;Struijs;Big Data and Society,2014