Affiliation:
1. University of Huddersfield, UK
Abstract
Background: Increasing diabetes prevalence is a major public health concern. In this study we ask whether linked open data can be used to predict prescription volumes of drugs used in the treatment of diabetes across small geographies of England. Methods: We propose and demonstrate a methodology of utilising publicly available open data to infer the geo-spatial distribution of prescribed drugs for diabetes, at the lower layer super output area level. Multiple datasets are acquired, processed, and linked together, enabling a more in-depth analysis. Combining these linked datasets with published deprivation factors of geographies across England, we build highly predictive regression models. Results: Regression models were trained and are capable of accurately predicting diabetes prescribing volumes based on deprivation indicators of various geographies across England. Models built with data covering the city of Bradford, England, produced a predicted against actual correlation value of R2 = 0.672 using multiple linear regression and 0.775 using Least Absolute Shrinkage and Selection Operator (LASSO). Median age and air quality factors proved to be significant markers for diabetes prescribing. Conclusions: The results of this study suggest our methodology is robust and accurate. Such predictive models are useful to health authorities in light of increasing costs and increasing prevalence of diabetes. While using publicly available open data negates any issues of data privacy.