Author:
Oami Takehiko,Okada Yohei,Nakada Taka-aki
Abstract
AbstractBackgroundSystematic reviews require labor-intensive and time-consuming processes. Large language models (LLMs) have been recognized as promising tools for citation screening; however, the performance of LLMs in screening citations remained to be determined yet. This study aims to evaluate the potential of three leading LLMs - GPT-4o, Gemini 1.5 Pro, and Claude 3.5 Sonnet for literature screening.MethodsWe will conduct a prospective study comparing the accuracy, efficiency, and cost of literature citation screening using the three LLMs. Each model will perform literature searches for predetermined clinical questions from the Japanese Clinical Practice Guidelines for Management of Sepsis and Septic Shock (J-SSCG). We will measure and compare the time required for citation screening using each method. The sensitivity and specificity of the results from the conventional approach and each LLM-assisted process will be calculated and compared. Additionally, we will assess the total time spent and associated costs for each method to evaluate workload reduction and economic efficiency.Trial registrationThis research is submitted with the University hospital medical information network clinical trial registry (UMIN-CTR) [UMIN000054783].
Publisher
Cold Spring Harbor Laboratory