BACKGROUND
The opioid epidemic in the United States has precipitated a need for public health agencies to better understand risk factors associated with fatal overdoses. Matching person-level information stored in public health, medical, and human services datasets can enhance the understanding of opioid overdose risk factors and interventions. A major impediment to using datasets from separate agencies, has been the lack of a cross-organization unique identifier. Although different matching techniques that leverage patient demographic information can be used, the impact of using a particular matching approach is not well understood.
OBJECTIVE
This study compares the impact of using probabilistic versus deterministic matching algorithms to link disparate datasets together for identifying persons at risk of a fatal overdose.
METHODS
This study used statewide prescription drug monitoring program (PDMP), arrest, and mortality data matched at the person-level using a probabilistic and two deterministic matching algorithms. Impact of matching was assessed by comparing the prevalence of key risk indicators, the outcome, and performance of a multivariate logistic regression for fatal overdose using the combined datasets.
RESULTS
The probabilistically matched population had the highest degree of matching within the PDMP data and with arrest and mortality data, resulting in the highest prevalence of high-risk indicators and the outcome. Model performance using area under the curve (AUC) was comparable across the algorithms (probabilistic: 0.847; deterministic-basic: 0.854; deterministic+zip: 0.826), but demonstrated tradeoffs between sensitivity and specificity.
CONCLUSIONS
The probabilistic algorithm was more successful in linking patients with PDMP data with death and arrest data, resulting in a larger at-risk population. However, deterministic-basic matching may be a suitable option for understanding high-level risk based on the model’s area under the curve (0.854). The clinical use case should be considered when selecting a matching approach, as probabilistic algorithms can be more resource-intensive and costly to maintain compared with deterministic algorithms.