Assessing the Performance of Clinical Natural Language Processing Systems: Development of an Evaluation Methodology-Reference-Cited by-同舟云学术

Assessing the Performance of Clinical Natural Language Processing Systems: Development of an Evaluation Methodology

Published:2021-07-23 Issue:7 Volume:9 Page:e20492
ISSN:2291-9694
Container-title:JMIR Medical Informatics
language:en
Short-container-title:JMIR Med Inform

Author:

Canales Lea^ORCID,Menke Sebastian^ORCID,Marchesseau Stephanie^ORCID,D’Agostino Ariel^ORCID,del Rio-Bermudez Carlos^ORCID,Taberna Miren^ORCID,Tello Jorge^ORCID

Abstract

Background Clinical natural language processing (cNLP) systems are of crucial importance due to their increasing capability in extracting clinically important information from free text contained in electronic health records (EHRs). The conversion of a nonstructured representation of a patient’s clinical history into a structured format enables medical doctors to generate clinical knowledge at a level that was not possible before. Finally, the interpretation of the insights gained provided by cNLP systems has a great potential in driving decisions about clinical practice. However, carrying out robust evaluations of those cNLP systems is a complex task that is hindered by a lack of standard guidance on how to systematically approach them. Objective Our objective was to offer natural language processing (NLP) experts a methodology for the evaluation of cNLP systems to assist them in carrying out this task. By following the proposed phases, the robustness and representativeness of the performance metrics of their own cNLP systems can be assured. Methods The proposed evaluation methodology comprised five phases: (1) the definition of the target population, (2) the statistical document collection, (3) the design of the annotation guidelines and annotation project, (4) the external annotations, and (5) the cNLP system performance evaluation. We presented the application of all phases to evaluate the performance of a cNLP system called “EHRead Technology” (developed by Savana, an international medical company), applied in a study on patients with asthma. As part of the evaluation methodology, we introduced the Sample Size Calculator for Evaluations (SLiCE), a software tool that calculates the number of documents needed to achieve a statistically useful and resourceful gold standard. Results The application of the proposed evaluation methodology on a real use-case study of patients with asthma revealed the benefit of the different phases for cNLP system evaluations. By using SLiCE to adjust the number of documents needed, a meaningful and resourceful gold standard was created. In the presented use-case, using as little as 519 EHRs, it was possible to evaluate the performance of the cNLP system and obtain performance metrics for the primary variable within the expected CIs. Conclusions We showed that our evaluation methodology can offer guidance to NLP experts on how to approach the evaluation of their cNLP systems. By following the five phases, NLP experts can assure the robustness of their evaluation and avoid unnecessary investment of human and financial resources. Besides the theoretical guidance, we offer SLiCE as an easy-to-use, open-source Python library.

Publisher

JMIR Publications Inc.

Subject

Health Information Management,Health Informatics

Reference59 articles.

1. Finding the Missing Link for Big Biomedical Data

2. Language, Structure, and Reuse in the Electronic Health Record

3. Savana: Re-using Electronic Health Records with Artificial Intelligence

4. Towards a symbiotic relationship between big data, artificial intelligence, and hospital pharmacy

5. Natural Language Processing and the Representation of Clinical Data

Cited by 47 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Prediction model for major bleeding in anticoagulated patients with cancer-associated venous thromboembolism using machine learning and natural language processing;Clinical and Translational Oncology;2024-09-14

2. Towards full recovery with lurasidone: effective doses in the treatment of agitation, affective, positive and cognitive symptoms in schizophrenia, and dual psychosis;Drugs in Context;2024-08-05

3. Botulinum Toxin Type A (BoNT-A) Use for Post-Stroke Spasticity: A Multicenter Study Using Natural Language Processing and Machine Learning;Toxins;2024-08-02

4. Tratamiento anticoagulante oral en la fibrilación auricular: AFIRMA, el estudio de vida real realizado mediante procesamiento de lenguaje natural y aprendizaje automático;Revista Clínica Española;2024-08

5. Oral anticoagulant treatment in atrial fibrillation: the AFIRMA real-world study using natural language processing and machine learning;Revista Clínica Española (English Edition);2024-08