Abstract
ABSTRACTObjectiveThis systematic review aims to assess how information from unstructured clinical text is used to develop and validate prognostic risk prediction models. We summarize the prediction problems and methodological landscape and assess whether using unstructured clinical text data in addition to more commonly used structured data improves the prediction performance.Materials and MethodsWe searched Embase, MEDLINE, Web of Science, and Google Scholar to identify studies that developed prognostic risk prediction models using unstructured clinical text data published in the period from January 2005 to March 2021. Data items were extracted, analyzed, and a meta-analysis of the model performance was carried out to assess the added value of text to structured-data models.ResultsWe identified 126 studies that described 145 clinical prediction problems. Combining text and structured data improved model performance, compared to using only text or only structured data. In these studies, a wide variety of dense and sparse numeric text representations were combined with both deep learning and more traditional machine learning methods. External validation, public availability, and explainability of the developed models was limited.ConclusionOverall, the use of unstructured clinical text data in the development of prognostic prediction models has been found beneficial in addition to structured data in most studies. The EHR text data is a source of valuable information for prediction model development and should not be neglected. We suggest a future focus on explainability and external validation of the developed models, promoting robust and trustworthy prediction models in clinical practice.
Publisher
Cold Spring Harbor Laboratory
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献