Assessment of machine learning algorithms in national data to classify the risk of self-harm among young adults in hospital: a retrospective study-Reference-Cited by-同舟云学术

Assessment of machine learning algorithms in national data to classify the risk of self-harm among young adults in hospital: a retrospective study

Published:2022-08-10 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Arora Anmol^ORCID,Bojko Louis,Kumar Santosh,Lillington Joseph,Panesar Sukhmeet,Petrungaro Bruno

Abstract

SummaryBackgroundSelf-harm is one of the most common presentations at accident and emergency departments in the UK and is a strong predictor of suicide risk. The UK Government has prioritised identifying risk factors and developing preventative strategies for self-harm. Machine learning offers a potential method to identify complex patterns with predictive value for the risk of self-harm.MethodsNational data in the UK Mental Health Services Data Set were isolated for patients aged 18‒30 years who started a mental health hospital admission between Aug 1, 2020 and Aug 1, 2021, and had been discharged by Jan 1, 2022. Data were obtained on age group, gender, ethnicity, employment status, marital status, accommodation status and source of admission to hospital and used to construct seven machine learning models that were used individually and as an ensemble to predict hospital stays that would be associated with a risk of self-harm.OutcomesThe training dataset included 23 808 items (including 1081 episodes of self-harm) and the testing dataset 5951 items (including 270 episodes of self-harm). The best performing algorithms were the random forest model (AUC-ROC 0.70, 95%CI:0.66-0.74) and the ensemble model (AUC-ROC 0.77 95%CI:0.75-0.79).InterpretationMachine learning algorithms could predict hospital stays with a high risk of self-harm based on readily available data that are routinely collected by health providers and recorded in the Mental Health Services Data Set. The findings should be validated externally with other real-world data.FundingThis study was supported by the Midlands and Lancashire Commissioning Support Unit.Research in contextEvidence before this studyDespite self-harm being repeatedly labelled as a national priority for psychiatric healthcare research, it remains challenging for clinicians to stratify the risk of self-harm in patients. National guidelines have highlighted deficiencies in care and attention is being paid towards the use of large datasets to develop evidence-based risk stratification strategies. However, many of the tools so far developed rely upon elements of the patient’s clinical history, which requires well curated datasets at a population level and previous engagement with care services at an individual level. Reliance upon elements of a patient’s clinical history also risks biasing against patients with missing data or against hospitals where data is poorly recorded.Added value of this studyIn this study, we use commissioning data that is routinely collected in the United Kingdom by healthcare providers with each hospital admission. Of the variables that were available for analysis, recursive feature elimination optimised our variable selection to include only age group, source of hospital admission, gender, and employment status. Machine learning algorithms were able to predict hospital episodes in which patients self-harmed in the majority of cases using a national dataset. Random forest and ensemble machine learning methods were the best-performing models. Sensitivity and specificity at predicting self-harm occurrence were 0.756 and 0.596, respectively, for the random forest model and 0.703 and 0.730 for the ensemble model. To our knowledge, this is the first study of its kind and represents an advance in the prediction of inpatient self-harm by limiting the amount of information required to make predictions to that which would be near-universally available at the point of the admission, nationally.Implications of all the available evidenceThere is a role for machine learning to be used to stratify the risk of self-harm when patients are admitted to mental health facilities, using only commissioning data that is easily accessible at the point of care. External validation of these findings is required as whilst the algorithms were tested on a large sample of national data, there remains a need for prospective studies to assess the real-world application of such machine learning models.

Publisher

Cold Spring Harbor Laboratory

Reference40 articles.

1. Risk of repeated self-harm and associated factors in children, adolescents and young adults;BMC Psychiatry,2016

2. NHS England. Advancing Mental Health Equalities [Internet]. 2021 [cited 2021 Nov 25]. Available from: https://www.england.nhs.uk/ltphimenu/mental-health/advancing-mental-health-equalities/

3. NICE. Self-harm is everyone’s business, NICE says in new draft guideline | News and features | News [Internet]. NICE. NICE; 2022 [cited 2022 Feb 9]. Available from: https://www.nice.org.uk/news/article/self-harm-is-everyone-s-business-nice-says-in-new-draft-guideline

4. NICE. Self-harm: the short-term physical and psychological management and secondary prevention of self-harm in primary and secondary care [Internet]. Leicester; London: British Psychological Society ; Royal College of Psychiatrists; 2004 [cited 2022 Jul 20]. Available from: https://www.nice.org.uk/guidance/cg16/evidence/full-guideline-189936541

5. Prevalence of non-suicidal self-harm and service contact in England, 2000–14: repeated cross-sectional surveys of the general population;The Lancet Psychiatry,2019