Implementation of a Cohort Retrieval System for Clinical Data Repositories Using the Observational Medical Outcomes Partnership Common Data Model: Proof-of-Concept System Validation (Preprint)-Reference-Cited by-同舟云学术

Implementation of a Cohort Retrieval System for Clinical Data Repositories Using the Observational Medical Outcomes Partnership Common Data Model: Proof-of-Concept System Validation (Preprint)

Published:2019-12-10 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Liu Sijia^ORCID,Wang Yanshan^ORCID,Wen Andrew^ORCID,Wang Liwei^ORCID,Hong Na^ORCID,Shen Feichen^ORCID,Bedrick Steven^ORCID,Hersh William^ORCID,Liu Hongfang^ORCID

Abstract

BACKGROUND

Widespread adoption of electronic health records has enabled the secondary use of electronic health record data for clinical research and health care delivery. Natural language processing techniques have shown promise in their capability to extract the information embedded in unstructured clinical data, and information retrieval techniques provide flexible and scalable solutions that can augment natural language processing systems for retrieving and ranking relevant records.

OBJECTIVE

In this paper, we present the implementation of a cohort retrieval system that can execute textual cohort selection queries on both structured data and unstructured text—Cohort Retrieval Enhanced by Analysis of Text from Electronic Health Records (CREATE).

METHODS

CREATE is a proof-of-concept system that leverages a combination of structured queries and information retrieval techniques on natural language processing results to improve cohort retrieval performance using the Observational Medical Outcomes Partnership Common Data Model to enhance model portability. The natural language processing component was used to extract common data model concepts from textual queries. We designed a hierarchical index to support the common data model concept search utilizing information retrieval techniques and frameworks.

RESULTS

Our case study on 5 cohort identification queries, evaluated using the precision at 5 information retrieval metric at both the patient-level and document-level, demonstrates that CREATE achieves a mean precision at 5 of 0.90, which outperforms systems using only structured data or only unstructured text with mean precision at 5 values of 0.54 and 0.74, respectively.

CONCLUSIONS

The implementation and evaluation of Mayo Clinic Biobank data demonstrated that CREATE outperforms cohort retrieval systems that only use one of either structured data or unstructured text in complex textual cohort queries.

Publisher

JMIR Publications Inc.

Reference26 articles.

1. Use of a Medical Records Linkage System to Enumerate a Dynamic Population Over Time: The Rochester Epidemiology Project

2. Multiparameter Intelligent Monitoring in Intensive Care II: A public-access intensive care unit database*

3. Screening pregnant women for suicidal behavior in electronic medical records: diagnostic codes vs. clinical notes processed by natural language processing

4. Improving Patient Cohort Identification Using Natural Language Processing

5. Automated Identification of Postoperative Complications Within an Electronic Medical Record Using Natural Language Processing

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Transformation of Pathology Reports Into the Common Data Model With Oncology Module: Use Case for Colon Cancer (Preprint);2020-03-04