Abstract
Background
Suicide is a significant public health issue. Many risk prediction tools have been developed to estimate an individual’s risk of suicide. Risk prediction models can go beyond individual risk assessment; one important application of risk prediction models is population health planning. Suicide is a result of the interaction among the risk and protective factors at the individual, health care system, and community levels. Thus, policy and decision makers can play an important role in suicide prevention. However, few prediction models for the population risk of suicide have been developed.
Objective
This study aims to develop and validate prediction models for the population risk of suicide using health administrative data, considering individual-, health system–, and community-level predictors.
Methods
We used a case-control study design to develop sex-specific risk prediction models for suicide, using the health administrative data in Quebec, Canada. The training data included all suicide cases (n=8899) that occurred from January 1, 2002, to December 31, 2010. The control group was a 1% random sample of living individuals in each year between January 1, 2002, and December 31, 2010 (n=645,590). Logistic regression was used to develop the prediction models based on individual-, health care system–, and community-level predictors. The developed model was converted into synthetic estimation models, which concerted the individual-level predictors into community-level predictors. The synthetic estimation models were directly applied to the validation data from January 1, 2011, to December 31, 2019. We assessed the performance of the synthetic estimation models with four indicators: the agreement between predicted and observed proportions of suicide, mean average error, root mean square error, and the proportion of correctly identified high-risk regions.
Results
The sex-specific models based on individual data had good discrimination (male model: C=0.79; female model: C=0.85) and calibration (Brier score for male model 0.01; Brier score for female model 0.005). With the regression-based synthetic models applied in the validation data, the absolute differences between the synthetic risk estimates and observed suicide risk ranged from 0% to 0.001%. The root mean square errors were under 0.2. The synthetic estimation model for males correctly predicted 4 of 5 high-risk regions in 8 years, and the model for females correctly predicted 4 of 5 high-risk regions in 5 years.
Conclusions
Using linked health administrative databases, this study demonstrated the feasibility and the validity of developing prediction models for the population risk of suicide, incorporating individual-, health system–, and community-level variables. Synthetic estimation models built on routinely collected health administrative data can accurately predict the population risk of suicide. This effort can be enhanced by timely access to other critical information at the population level.