AutoCriteria: a generalizable clinical trial eligibility criteria extraction system powered by large language models-Reference-Cited by-同舟云学术

AutoCriteria: a generalizable clinical trial eligibility criteria extraction system powered by large language models

Published:2023-11-11 Issue:2 Volume:31 Page:375-385
ISSN:1067-5027
Container-title:Journal of the American Medical Informatics Association
language:en
Short-container-title:

Author:

Datta Surabhi¹^ORCID,Lee Kyeryoung¹,Paek Hunki¹,Manion Frank J¹,Ofoegbu Nneka¹,Du Jingcheng¹^ORCID,Li Ying²,Huang Liang-Chin¹,Wang Jingqi¹,Lin Bin¹,Xu Hua³,Wang Xiaoyan¹

Affiliation:

1. Melax Technologies , Houston, TX 77030, United States

2. Regeneron Pharmaceuticals , Tarrytown, NY 10591, United States

3. Yale School of Medicine , New Haven, CT 06511, United States

Abstract

Abstract Objectives We aim to build a generalizable information extraction system leveraging large language models to extract granular eligibility criteria information for diverse diseases from free text clinical trial protocol documents. We investigate the model’s capability to extract criteria entities along with contextual attributes including values, temporality, and modifiers and present the strengths and limitations of this system. Materials and Methods The clinical trial data were acquired from https://ClinicalTrials.gov/. We developed a system, AutoCriteria, which comprises the following modules: preprocessing, knowledge ingestion, prompt modeling based on GPT, postprocessing, and interim evaluation. The final system evaluation was performed, both quantitatively and qualitatively, on 180 manually annotated trials encompassing 9 diseases. Results AutoCriteria achieves an overall F1 score of 89.42 across all 9 diseases in extracting the criteria entities, with the highest being 95.44 for nonalcoholic steatohepatitis and the lowest of 84.10 for breast cancer. Its overall accuracy is 78.95% in identifying all contextual information across all diseases. Our thematic analysis indicated accurate logic interpretation of criteria as one of the strengths and overlooking/neglecting the main criteria as one of the weaknesses of AutoCriteria. Discussion AutoCriteria demonstrates strong potential to extract granular eligibility criteria information from trial documents without requiring manual annotations. The prompts developed for AutoCriteria generalize well across different disease areas. Our evaluation suggests that the system handles complex scenarios including multiple arm conditions and logics. Conclusion AutoCriteria currently encompasses a diverse range of diseases and holds potential to extend to more in the future. This signifies a generalizable and scalable solution, poised to address the complexities of clinical trial application in real-world settings.

Publisher

Oxford University Press (OUP)

Subject

Health Informatics

Link

https://academic.oup.com/jamia/article-pdf/31/2/375/56209472/ocad218.pdf

Reference35 articles.

1. Optimizing clinical research participant selection with informatics;Weng;Trends Pharmacol Sci,2015

2. Automated matching software for clinical trials eligibility: measuring efficiency and flexibility;Penberthy;Contemp Clin Trials,2010

3. Automated classification of clinical trial eligibility criteria text based on ensemble learning and metric learning;Zeng;BMC Med Inform Decis Mak,2021

4. EliIE: an open-source information extraction system for clinical trial eligibility criteria;Kang;J Am Med Inform Assoc,2017

Cited by 14 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Prompt Engineering Paradigms for Medical Applications: Scoping Review;Journal of Medical Internet Research;2024-09-10

2. Automating biomedical literature review for rapid drug discovery: Leveraging GPT-4 to expedite pandemic response;International Journal of Medical Informatics;2024-09

3. A Methodology for Using Large Language Models to Create User-Friendly Applications for Medicaid Redetermination and Other Social Services;International Journal of Public Health;2024-08-16

4. Enhancing Large Language Models with Human Expertise for Disease Detection in Electronic Health Records;2024 IEEE International Conference on Digital Health (ICDH);2024-07-07

5. Potential application of artificial intelligence in cancer therapy;Current Opinion in Oncology;2024-06-24