Utilizing ChatGPT to select literature for meta-analysis shows workload reduction while maintaining a similar recall level as manual curation-Reference-Cited by-同舟云学术

Utilizing ChatGPT to select literature for meta-analysis shows workload reduction while maintaining a similar recall level as manual curation

Published:2023-09-07 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Cai Xiangming^ORCID,Geng Yuanming,Du Yiming,Westerman Bart,Wang Duolao^ORCID,Ma Chiyuan,Vallejo Juan J. Garcia

Abstract

AbstractBackgroundLarge language models (LLMs) like ChatGPT showed great potential in aiding medical research. A heavy workload in filtering records is needed during the research process of evidence-based medicine, especially meta-analysis. However, no study tried to use LLMs to help screen records in meta-analysis. In this research, we aimed to explore the possibility of incorporating ChatGPT to facilitate the screening step based on the title and abstract of records during meta-analysis.MethodsTo assess our strategy, we selected three meta-analyses from the literature, together with a glioma meta-analysis embedded in the study, as additional validation. For the automatic selection of records from curated meta-analyses, a four-step strategy called LARS was developed, consisting of (1) criteria selection and single-prompt (prompt with one criterion) creation, (2) best combination identification, (3) combined-prompt (prompt with one or more criteria) creation, and (4) request sending and answer summary. We evaluated the robustness of the response from ChatGPT with repeated requests. Recall, workload reduction, precision, and F1 score were calculated to assess the performance of LARS.FindingsChatGPT showed a stable response for repeated requests (robustness score: 0·747 – 0·996). A variable performance was found between different single-prompts with a mean recall of 0·841. Based on these single-prompts, we were able to find combinations with performance better than the pre-set threshold. Finally, with a best combination of criteria identified, LARS showed a 39·5% workload reduction on average with a recall greater than 0·9. In the glioma meta-analysis, we found no prognostic effect of CD8+ TIL on overall survival, progress-free survival, and survival time after immunotherapy.InterpretationWe show here the groundbreaking finding that automatic selection of literature for meta-analysis is possible with ChatGPT. We provide it here as a pipeline, LARS, which showed a great workload reduction while maintaining a pre-set recall.FundingChina Scholarship Council.

Publisher

Cold Spring Harbor Laboratory

Reference29 articles.

1. The next generation of evidence-based medicine

2. A Deep Learning Approach to Refine the Identification of High-Quality Clinical Research Articles From the Biomedical Literature: Protocol for Algorithm Development and Validation;JMIR Res Protoc,2021

3. Artificial intelligence in COVID-19 evidence syntheses was underutilized, but impactful: a methodological study;J Clin Epidemiol,2022

4. The semi-automation of title and abstract screening: a retrospective exploration of ways to leverage Abstrackr’s relevance predictions in systematic and rapid reviews;BMC Med Res Methodol,2020

5. Performance of active learning models for screening prioritization in systematic reviews: a simulation study into the Average Time to Discover relevant records;Syst Rev,2023

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Potential Roles of Large Language Models in the Production of Systematic Reviews and Meta-Analyses;Journal of Medical Internet Research;2024-06-25

2. Addressing Data Scarcity in the Medical Domain: A GPT-Based Approach for Synthetic Data Generation and Feature Extraction;Information;2024-05-06