Natural language processing for disease phenotyping in UK primary care records for research: a pilot study in myocardial infarction and death-Reference-Cited by-同舟云学术

Natural language processing for disease phenotyping in UK primary care records for research: a pilot study in myocardial infarction and death

Published:2019-11 Issue:S1 Volume:10 Page:
ISSN:2041-1480
Container-title:Journal of Biomedical Semantics
language:en
Short-container-title:J Biomed Semant

Author:

Shah Anoop D.,Bailey Emily,Williams Tim,Denaxas Spiros,Dobson Richard,Hemingway Harry

Abstract

Abstract Background Free text in electronic health records (EHR) may contain additional phenotypic information beyond structured (coded) information. For major health events – heart attack and death – there is a lack of studies evaluating the extent to which free text in the primary care record might add information. Our objectives were to describe the contribution of free text in primary care to the recording of information about myocardial infarction (MI), including subtype, left ventricular function, laboratory results and symptoms; and recording of cause of death. We used the CALIBER EHR research platform which contains primary care data from the Clinical Practice Research Datalink (CPRD) linked to hospital admission data, the MINAP registry of acute coronary syndromes and the death registry. In CALIBER we randomly selected 2000 patients with MI and 1800 deaths. We implemented a rule-based natural language engine, the Freetext Matching Algorithm, on site at CPRD to analyse free text in the primary care record without raw data being released to researchers. We analysed text recorded within 90 days before or 90 days after the MI, and on or after the date of death. Results We extracted 10,927 diagnoses, 3658 test results, 3313 statements of negation, and 850 suspected diagnoses from the myocardial infarction patients. Inclusion of free text increased the recorded proportion of patients with chest pain in the week prior to MI from 19 to 27%, and differentiated between MI subtypes in a quarter more patients than structured data alone. Cause of death was incompletely recorded in primary care; in 36% the cause was in coded data and in 21% it was in free text. Only 47% of patients had exactly the same cause of death in primary care and the death registry, but this did not differ between coded and free text causes of death. Conclusions Among patients who suffer MI or die, unstructured free text in primary care records contains much information that is potentially useful for research such as symptoms, investigation results and specific diagnoses. Access to large scale unstructured data in electronic health records (millions of patients) might yield important insights.

Publisher

Springer Science and Business Media LLC

Subject

Computer Networks and Communications,Health Informatics,Computer Science Applications,Information Systems

Link

http://link.springer.com/content/pdf/10.1186/s13326-019-0214-4.pdf

Reference33 articles.

1. Kalra D, Fernando B. Approaches to enhancing the validity of coded data in electronic medical records. Prim Care Respir J. 2011;20:4–5.

2. Liao KP, Cai T, Savova GK, Murphy SN, Karlson EW, Ananthakrishnan AN, et al. Development of phenotype algorithms using electronic medical records and incorporating natural language processing. BMJ. 2015;350:h1885. https://doi.org/10.1136/bmj.h1885.

3. Natural language processing tools. eMERGE Network. https://emerge.mc.vanderbilt.edu/natural-language-processing-nlp-tools/. Accessed 23 May 2018.

4. Herrett E, George J, Denaxas S, Bhaskaran K, Timmis A, Hemingway H, Smeeth L. Type and timing of heralding in ST-elevation and non-ST-elevation myocardial infarction: an analysis of prospectively collected electronic healthcare records linked to the national registry of acute coronary syndromes. Eur Heart J Acute Cardiovasc Care. 2013;2(3):235–45. https://doi.org/10.1177/2048872613487495.

5. Pakhomov SS, Hemingway H, Weston SA, Jacobsen SJ, Rodeheffer R, Roger VL. Epidemiology of angina pectoris: role of natural language processing of the medical record. Am Heart J. 2007;153(4):666–73.

Cited by 15 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Natural language processing with machine learning methods to analyze unstructured patient-reported outcomes derived from electronic health records: A systematic review;Artificial Intelligence in Medicine;2023-12

2. Long Covid symptoms and diagnosis in primary care: A cohort study using structured and unstructured data in The Health Improvement Network primary care database;PLOS ONE;2023-09-26

3. A systematic review of clinical health conditions predicted by machine learning diagnostic and prognostic models trained or validated using real-world primary health care data;PLOS ONE;2023-09-08

4. Long Covid symptoms and diagnosis in primary care: a cohort study using structured and unstructured data in The Health Improvement Network primary care database;2023-01-09

5. A survey on clinical natural language processing in the United Kingdom from 2007 to 2022;npj Digital Medicine;2022-12-21