Performance of a Large Language Model in Screening Citations-Reference-Cited by-同舟云学术

Performance of a Large Language Model in Screening Citations

Published:2024-07-08 Issue:7 Volume:7 Page:e2420496
ISSN:2574-3805
Container-title:JAMA Network Open
language:en
Short-container-title:JAMA Netw Open

Author:

Oami Takehiko¹,Okada Yohei²³,Nakada Taka-aki¹

Affiliation:

1. Department of Emergency and Critical Care Medicine, Chiba University Graduate School of Medicine, Chiba, Japan

2. Department of Preventive Services, Kyoto University Graduate School of Medicine, Kyoto, Japan

3. Health Services and Systems Research, Duke-NUS Medical School, National University of Singapore, Singapore

Abstract

ImportanceLarge language models (LLMs) are promising as tools for citation screening in systematic reviews. However, their applicability has not yet been determined.ObjectiveTo evaluate the accuracy and efficiency of an LLM in title and abstract literature screening.Design, Setting, and ParticipantsThis prospective diagnostic study used the data from the title and abstract screening process for 5 clinical questions (CQs) in the development of the Japanese Clinical Practice Guidelines for Management of Sepsis and Septic Shock. The LLM decided to include or exclude citations based on the inclusion and exclusion criteria in terms of patient, population, problem; intervention; comparison; and study design of the selected CQ and was compared with the conventional method for title and abstract screening. This study was conducted from January 7 to 15, 2024.ExposuresLLM (GPT-4 Turbo)–assisted citation screening or the conventional method.Main Outcomes and MeasuresThe sensitivity and specificity of the LLM-assisted screening process was calculated, and the full-text screening result using the conventional method was set as the reference standard in the primary analysis. Pooled sensitivity and specificity were also estimated, and screening times of the 2 methods were compared.ResultsIn the conventional citation screening process, 8 of 5634 publications in CQ 1, 4 of 3418 in CQ 2, 4 of 1038 in CQ 3, 17 of 4326 in CQ 4, and 8 of 2253 in CQ 5 were selected. In the primary analysis of 5 CQs, LLM-assisted citation screening demonstrated an integrated sensitivity of 0.75 (95% CI, 0.43 to 0.92) and specificity of 0.99 (95% CI, 0.99 to 0.99). Post hoc modifications to the command prompt improved the integrated sensitivity to 0.91 (95% CI, 0.77 to 0.97) without substantially compromising specificity (0.98 [95% CI, 0.96 to 0.99]). Additionally, LLM-assisted screening was associated with reduced time for processing 100 studies (1.3 minutes vs 17.2 minutes for conventional screening methods; mean difference, −15.25 minutes [95% CI, −17.70 to −12.79 minutes]).Conclusions and RelevanceIn this prospective diagnostic study investigating the performance of LLM-assisted citation screening, the model demonstrated acceptable sensitivity and reasonably high specificity with reduced processing time. This novel method could potentially enhance efficiency and reduce workload in systematic reviews.

Publisher

American Medical Association (AMA)

Link

https://jamanetwork.com/journals/jamanetworkopen/articlepdf/2820861/oami_2024_oi_240660_1719580325.87415.pdf

Reference27 articles.

1. Analysis of the time and workers needed to conduct systematic reviews of medical interventions using data from the PROSPERO registry.;Borah;BMJ Open,2017

2. Precision of healthcare systematic review searches in a cross-sectional sample.;Sampson;Res Synth Methods,2011

3. Error rates of human reviewers during abstract screening in systematic reviews.;Wang;PLoS One,2020

4. An open source machine learning framework for efficient and transparent systematic reviews.;van de Schoot;Nat Mach Intell,2021

5. Toward systematic review automation: a practical guide to using machine learning tools in research synthesis.;Marshall;Syst Rev,2019

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. LLMscreen: A Python Package for Systematic Review Screening of Scientific Texts Using Prompt Engineering;2024-09-11