Extractive Clinical Question-Answering With Multianswer and Multifocus Questions: Data Set Development and Evaluation Study-Reference-Cited by-同舟云学术

Extractive Clinical Question-Answering With Multianswer and Multifocus Questions: Data Set Development and Evaluation Study

Published:2023-06-20 Issue: Volume:2 Page:e41818
ISSN:2817-1705
Container-title:JMIR AI
language:en
Short-container-title:JMIR AI

Author:

Moon Sungrim^ORCID,He Huan^ORCID,Jia Heling^ORCID,Liu Hongfang^ORCID,Fan Jungwei Wilfred^ORCID

Abstract

Background Extractive question-answering (EQA) is a useful natural language processing (NLP) application for answering patient-specific questions by locating answers in their clinical notes. Realistic clinical EQA can yield multiple answers to a single question and multiple focus points in 1 question, which are lacking in existing data sets for the development of artificial intelligence solutions. Objective This study aimed to create a data set for developing and evaluating clinical EQA systems that can handle natural multianswer and multifocus questions. Methods We leveraged the annotated relations from the 2018 National NLP Clinical Challenges corpus to generate an EQA data set. Specifically, the 1-to-N, M-to-1, and M-to-N drug-reason relations were included to form the multianswer and multifocus question-answering entries, which represent more complex and natural challenges in addition to the basic 1-drug-1-reason cases. A baseline solution was developed and tested on the data set. Results The derived RxWhyQA data set contains 96,939 QA entries. Among the answerable questions, 25% of them require multiple answers, and 2% of them ask about multiple drugs within 1 question. Frequent cues were observed around the answers in the text, and 90% of the drug and reason terms occurred within the same or an adjacent sentence. The baseline EQA solution achieved a best F1-score of 0.72 on the entire data set, and on specific subsets, it was 0.93 for the unanswerable questions, 0.48 for single-drug questions versus 0.60 for multidrug questions, and 0.54 for the single-answer questions versus 0.43 for multianswer questions. Conclusions The RxWhyQA data set can be used to train and evaluate systems that need to handle multianswer and multifocus questions. Specifically, multianswer EQA appears to be challenging and therefore warrants more investment in research. We created and shared a clinical EQA data set with multianswer and multifocus questions that would channel future research efforts toward more realistic scenarios.

Publisher

JMIR Publications Inc.

Reference23 articles.

1. Putting the “why” in “EHR”: capturing and coding clinical cognition

2. Medical Question Answering for Clinical Decision Support

3. Biomedical Question Answering: A Survey of Approaches and Challenges

4. Clinical Reading Comprehension: A Thorough Analysis of the emrQA Dataset

5. Sequence tagging for biomedical extractive question answering