Hospital-wide natural language processing summarising the health data of 1 million patients-Reference-Cited by-同舟云学术

Hospital-wide natural language processing summarising the health data of 1 million patients

Published:2023-05-09 Issue:5 Volume:2 Page:e0000218
ISSN:2767-3170
Container-title:PLOS Digital Health
language:en
Short-container-title:PLOS Digit Health

Author:

Bean Daniel M.^ORCID,Kraljevic Zeljko,Shek Anthony,Teo James,Dobson Richard J. B.

Abstract

Electronic health records (EHRs) represent a major repository of real world clinical trajectories, interventions and outcomes. While modern enterprise EHR’s try to capture data in structured standardised formats, a significant bulk of the available information captured in the EHR is still recorded only in unstructured text format and can only be transformed into structured codes by manual processes. Recently, Natural Language Processing (NLP) algorithms have reached a level of performance suitable for large scale and accurate information extraction from clinical text. Here we describe the application of open-source named-entity-recognition and linkage (NER+L) methods (CogStack, MedCAT) to the entire text content of a large UK hospital trust (King’s College Hospital, London). The resulting dataset contains 157M SNOMED concepts generated from 9.5M documents for 1.07M patients over a period of 9 years. We present a summary of prevalence and disease onset as well as a patient embedding that captures major comorbidity patterns at scale. NLP has the potential to transform the health data lifecycle, through large-scale automation of a traditionally manual task.

Funder

Medical Research Council

Publisher

Public Library of Science (PLoS)

Reference38 articles.

1. The Value of Unstructured Electronic Health Record Data in Geriatric Syndrome Case Identification.;H Kharrazi;J Am Geriatr Soc.,2018

2. The Revival of the Notes Field: Leveraging the Unstructured Content in Electronic Health Records.;M Assale;Front Med.,2019

3. Data from clinical notes: a perspective on the tension between structure and flexible documentation;ST Rosenbloom;J Am Med Inform Assoc,2011

4. How good are the data? Feasible approach to validation of metrics of quality derived from an outpatient electronic health record;AL Benin;Am J Med Qual,2011

5. Temporal patterns of multi-morbidity in 570157 ischemic heart disease patients: a nationwide cohort study;AD Haue;Cardiovasc Diabetol,2022

Cited by 8 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Development of a natural language processing pipeline for assessment of cardiovascular risk in myeloproliferative neoplasms;HemaSphere;2024-08

2. Foresight—a generative pretrained transformer for modelling of patient timelines using electronic health records: a retrospective modelling study;The Lancet Digital Health;2024-04

3. Applying Natural Language Processing in Healthcare Using Data Science;2024 11th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions) (ICRITO);2024-03-14

4. Artificial intelligence and machine learning in rheumatology;Rheumatology;2024-02-06

5. Artificial intelligence methods for improved detection of undiagnosed heart failure with preserved ejection fraction;European Journal of Heart Failure;2024-01-11