Author:
Hilbert A,Baskan D,Rieger J,Wagner C,Sehlen S,García-Rudolph A,Kelleher JD,Dengler NF,Kossen T,Madai VI,Frey D
Abstract
AbstractBackgroundWith an annual rate of 5.5 million cases, ischemic stroke is the second leading cause of death and permanent disability worldwide posing a significant medical, financial and social burden. Current approaches relax high-risk profiles of imminent stroke to mid- to long-term risk assessment, tempering the importance of immediate preventative action. Claims data may support the development of new risk prediction paradigms for better, individualized management of disease.MethodsWe developed a data-driven paradigm to predict personalized risk of imminent primary ischemic stroke. We used social health insurance data from northeast Germany (between 2008-2018). Stroke events were defined by the presence of an ischemic stroke ICD-10 diagnosis within the available insurance period. Controls (n=150,091) and strokes (n=53,047) were matched by age (mean=76) and insurance length (mean=3 years), resulting in a generally aged, high-risk study population.We trained traditional and Machine Learning (ML) classifiers to predict the overall likelihood of a primary event based on 55 features including demographic parameters, ICD-10 diagnosis of diseases and dependence on care. Binary ICD-10 features were translated into temporal duration of diagnoses by counting days since the first appearance of disease in the patients’ records. We used SHAP feature importance scores for global and local explanation of model output.FindingsThe best ML model, Tree-boosting, yielded notably high performance with an area under the receiver operating characteristics curve of 0.91, sensitivity of 0.84 and specificity of 0.81. Long duration of hypertension, dyslipidemia and diabetes type 2 were most influential for predicting stroke while frequent dependence on care proved to mitigate stroke risk.InterpretationOur proposed data-driven ML approach provides a highly promising direction for improved and personalized prevention and management of imminent stroke, while the developed models offer direct applicability for risk stratification in the north-east German population.FundingHorizon2020 (PRECISE4Q, #777107)
Publisher
Cold Spring Harbor Laboratory