Development and application of a high throughput natural language processing architecture to convert all clinical documents in a clinical data warehouse into standardized medical vocabularies-Reference-Cited by-同舟云学术

Development and application of a high throughput natural language processing architecture to convert all clinical documents in a clinical data warehouse into standardized medical vocabularies

Published:2019-05-30 Issue:11 Volume:26 Page:1364-1369
ISSN:1067-5027
Container-title:Journal of the American Medical Informatics Association
language:en
Short-container-title:

Author:

Afshar Majid¹²^ORCID,Dligach Dmitriy¹²³,Sharma Brihat³,Cai Xiaoyuan⁴,Boyda Jason⁴,Birch Steven⁴,Valdez Daniel⁴,Zelisko Suzan⁴,Joyce Cara¹²,Modave François¹²,Price Ron¹⁴

Affiliation:

1. Center for Health Outcomes and Informatics Research, Health Sciences Division, Loyola University Chicago, Maywood, Illinois, USA

2. Department of Public Health Sciences, Stritch School of Medicine, Loyola University Chicago, Maywood, Illinois, USA

3. Department of Computer Science, Loyola University, Chicago, Illinois, USA

4. Informatics and Systems Development, Health Sciences Division, Loyola University Chicago, Maywood, Illinois, USA

Abstract

Abstract Objective Natural language processing (NLP) engines such as the clinical Text Analysis and Knowledge Extraction System are a solution for processing notes for research, but optimizing their performance for a clinical data warehouse remains a challenge. We aim to develop a high throughput NLP architecture using the clinical Text Analysis and Knowledge Extraction System and present a predictive model use case. Materials and Methods The CDW was comprised of 1 103 038 patients across 10 years. The architecture was constructed using the Hadoop data repository for source data and 3 large-scale symmetric processing servers for NLP. Each named entity mention in a clinical document was mapped to the Unified Medical Language System concept unique identifier (CUI). Results The NLP architecture processed 83 867 802 clinical documents in 13.33 days and produced 37 721 886 606 CUIs across 8 standardized medical vocabularies. Performance of the architecture exceeded 500 000 documents per hour across 30 parallel instances of the clinical Text Analysis and Knowledge Extraction System including 10 instances dedicated to documents greater than 20 000 bytes. In a use–case example for predicting 30-day hospital readmission, a CUI-based model had similar discrimination to n-grams with an area under the curve receiver operating characteristic of 0.75 (95% CI, 0.74–0.76). Discussion and Conclusion Our health system’s high throughput NLP architecture may serve as a benchmark for large-scale clinical research using a CUI-based approach.

Funder

NIH

Publisher

Oxford University Press (OUP)

Subject

Health Informatics

Link

http://academic.oup.com/jamia/article-pdf/26/11/1364/36089053/ocz068.pdf

Reference27 articles.

1. Extracting information from the text of electronic medical records to improve case detection: a systematic review;Ford;J Am Med Inform Assoc,2016

2. Extracting information from textual documents in the electronic health record: a review of recent research;Meystre;Yearb Med Inform,2008

3. Development and validation of a natural language processing tool to identify patients treated for pneumonia across VA emergency departments;Jones;Appl Clin Inform,2018