Affiliation:
1. Department of Geography University College London London UK
Abstract
To minimize the disclosure of personal information, sensitive location data collected by mobile phones is often aggregated to predefined geographic units and presented as counts of devices at a given time. The use of grids or units created by statistical agencies for the dissemination of traditional data sets—such as censuses—are common choices for this aggregation process. However, these can result in large variations in the number of devices encapsulated within each geographic unit, resulting in over‐generalization and a loss of information in some areas. To alleviate this issue, we propose a new method for the aggregation of mobile phone generated location data sets that creates bespoke geometries that maximize the granularity of the data, whilst minimizing the risks of disclosing personal information. The resulting small areas are built on Uber's H3 hexagonal indexing system by attributing activity counts and land‐use features to each cell, then merging cells into geographies containing a predetermined number of data points and respecting the underlying topography and land use. This methodology has applications to widely available data sets and enables bespoke geographical units to be created for different contexts. We compare the generated units to established aggregates from the England and Wales Census and Ordnance Survey. We demonstrate that our outputs are more representative of the original mobile phone data set and minimize data omission caused by low counts. This speaks to the need for a data‐driven and context‐driven regionalization methodology.
Funder
Economic and Social Research Council