Author:
Kurvits Siim,Harro Ainika,Reigo Anu,Ott Anne,Laur Sven,Särg Dage,Tampuu Ardi,Alasoo Kaur,Vilo Jaak,Milani Lili,Haller Toomas, ,
Abstract
Abstract
Background
Ischemic stroke (IS) is a major health risk without generally usable effective measures of primary prevention. Early warning signals that are easy to detect and widely available can save lives. Estonia has one nation-wide Electronic Health Record (EHR) database for the storage of medical information of patients from hospitals and primary care providers.
Methods
We extracted structured and unstructured data from the EHRs of participants of the Estonian Biobank (EstBB) and evaluated different formats of input data to understand how this continuously growing dataset should be prepared for best prediction. The utility of the EHR database for finding blood- and urine-based biomarkers for IS was demonstrated by applying different analytical and machine learning (ML) methods.
Results
Several early trends in common clinical laboratory parameter changes (set of red blood indices, lymphocyte/neutrophil ratio, etc.) were established for IS prediction. The developed ML models predicted the future occurrence of IS with very high accuracy and Random Forests was proved as the most applicable method to EHR data.
Conclusions
We conclude that the EHR database and the risk factors uncovered are valuable resources in screening the population for risk of IS as well as constructing disease risk scores and refining prediction models for IS by ML.
Funder
European Regional Development Fund
Horizon 2020 Framework Programme
IT tippkeskus EXCITE
Eesti Teadusagentuur
Publisher
Springer Science and Business Media LLC
Cited by
3 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献