Affiliation:
1. Department of Biomedical Informatics, Columbia University , New York, New York, USA
2. Department of Statistics, Columbia University , New York, New York, USA
3. Department of Population Health Sciences, Weill Cornell Medicine , New York, New York, USA
Abstract
Abstract
Objective
To develop a computable representation for medical evidence and to contribute a gold standard dataset of annotated randomized controlled trial (RCT) abstracts, along with a natural language processing (NLP) pipeline for transforming free-text RCT evidence in PubMed into the structured representation.
Materials and methods
Our representation, EvidenceMap, consists of 3 levels of abstraction: Medical Evidence Entity, Proposition and Map, to represent the hierarchical structure of medical evidence composition. Randomly selected RCT abstracts were annotated following EvidenceMap based on the consensus of 2 independent annotators to train an NLP pipeline. Via a user study, we measured how the EvidenceMap improved evidence comprehension and analyzed its representative capacity by comparing the evidence annotation with EvidenceMap representation and without following any specific guidelines.
Results
Two corpora including 229 disease-agnostic and 80 COVID-19 RCT abstracts were annotated, yielding 12 725 entities and 1602 propositions. EvidenceMap saves users 51.9% of the time compared to reading raw-text abstracts. Most evidence elements identified during the freeform annotation were successfully represented by EvidenceMap, and users gave the enrollment, study design, and study Results sections mean 5-scale Likert ratings of 4.85, 4.70, and 4.20, respectively. The end-to-end evaluations of the pipeline show that the evidence proposition formulation achieves F1 scores of 0.84 and 0.86 in the adjusted random index score.
Conclusions
EvidenceMap extends the participant, intervention, comparator, and outcome framework into 3 levels of abstraction for transforming free-text evidence from the clinical literature into a computable structure. It can be used as an interoperable format for better evidence retrieval and synthesis and an interpretable representation to efficiently comprehend RCT findings.
Funder
Bridging the semantic gap between research eligibility criteria and clinical data
Publisher
Oxford University Press (OUP)
Cited by
7 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献