The Leaf Clinical Trials Corpus: a new resource for query generation from clinical trial eligibility criteria-Reference-Cited by-同舟云学术

The Leaf Clinical Trials Corpus: a new resource for query generation from clinical trial eligibility criteria

Published:2022-08-11 Issue:1 Volume:9 Page:
ISSN:2052-4463
Container-title:Scientific Data
language:en
Short-container-title:Sci Data

Author:

Dobbins Nicholas J.^ORCID,Mullen Tony^ORCID,Uzuner Özlem,Yetisgen Meliha

Abstract

AbstractIdentifying cohorts of patients based on eligibility criteria such as medical conditions, procedures, and medication use is critical to recruitment for clinical trials. Such criteria are often most naturally described in free-text, using language familiar to clinicians and researchers. In order to identify potential participants at scale, these criteria must first be translated into queries on clinical databases, which can be labor-intensive and error-prone. Natural language processing (NLP) methods offer a potential means of such conversion into database queries automatically. However they must first be trained and evaluated using corpora which capture clinical trials criteria in sufficient detail. In this paper, we introduce the Leaf Clinical Trials (LCT) corpus, a human-annotated corpus of over 1,000 clinical trial eligibility criteria descriptions using highly granular structured labels capturing a range of biomedical phenomena. We provide details of our schema, annotation process, corpus quality, and statistics. Additionally, we present baseline information extraction results on this corpus as benchmarks for future work.

Publisher

Springer Science and Business Media LLC

Subject

Library and Information Sciences,Statistics, Probability and Uncertainty,Computer Science Applications,Education,Information Systems,Statistics and Probability

Link

https://www.nature.com/articles/s41597-022-01521-0.pdf

Reference28 articles.

1. Richesson, R. L. et al. Electronic health records based phenotyping in next-generation clinical trials: a perspective from the NIH Health Care Systems Collaboratory. Journal of the American Medical Informatics Association 20, e226–e231 (2013).

2. Dobbins, N. J. et al. Leaf: an open-source, model-agnostic, data-driven web application for cohort discovery and translational biomedical research. Journal of the American Medical Informatics Association 27, 109–118 (2019).

3. Murphy, S. N. et al. Serving the enterprise and beyond with informatics for integrating biology and the bedside (i2b2). Journal of the American Medical Informatics Association 17, 124–130 (2010).

4. Yuan, C. et al. Criteria2Query: A natural language interface to clinical databases for cohort definition. Journal of the American Medical Informatics Association 26, 294–305, https://doi.org/10.1093/jamia/ocy178 (2019).

5. Wang, P., Shi, T. & Reddy, C. K. A translate-edit model for natural language question to sql query generation on multi-relational healthcare data. arXiv preprint arXiv:1908.01839 (2019).

Cited by 7 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Machine learning and natural language processing in clinical trial eligibility criteria parsing: a scoping review;Drug Discovery Today;2024-10

2. Optimizing Clinical Trial Eligibility Design Using Natural Language Processing Models and Real-World Data: Algorithm Development and Validation;JMIR AI;2024-07-29

3. NLP Applications—Other Biomedical Texts;Cognitive Informatics in Biomedicine and Healthcare;2024

4. AutoCriteria: a generalizable clinical trial eligibility criteria extraction system powered by large language models;Journal of the American Medical Informatics Association;2023-11-11

5. LeafAI: query generator for clinical cohort discovery rivaling a human programmer;Journal of the American Medical Informatics Association;2023-08-07