UNSTRUCTURED
This study aims to assess the potential of large language models (LLMs) to enhance reporting efficiency and accuracy in oncological imaging, specifically evaluating their knowledge of RECIST 1.1 guidelines. While the capabilities of LLMs have been explored across various domains, their specific applications in radiology are of significant interest due to the intricate and time-consuming nature of image evaluation in oncology. We conducted a comparative analysis involving seven different LLMs and a general radiologist (GR) to determine their proficiency in responding to RECIST 1.1-based multiple-choice questions.
Our methodology involved the creation of 25 multiple-choice questions by a board-certified radiologist, ensuring alignment with RECIST 1.1 guidelines. These questions were presented to seven LLMs—Claude 3 Opus, ChatGPT 4, ChatGPT 4o, Gemini 1.5 Pro, Mistral Large, Meta Llama 3 70B, and Perplexity Pro—as well as to a GR with six years of experience. The LLMs were prompted to answer as an experienced radiologist, and their responses were compared to those of the GR.
The results demonstrated that Claude 3 Opus achieved a perfect accuracy of 100% (25/25), followed closely by ChatGPT 4o with 96% (24/25). ChatGPT 4 and Mistral Large both scored 92% (23/25), while Meta Llama 3 70B, Perplexity Pro, and Gemini 1.5 each scored 88% (21/25). The GR also achieved a score of 92% (23/25). These findings highlight the impressive proficiency of current LLMs in understanding and applying RECIST 1.1 guidelines, suggesting their potential as valuable tools in radiology.
The outstanding performance of Claude 3 Opus raises the prospect of LLMs becoming integral to oncology practices, potentially enhancing the accuracy and efficiency of radiology reporting. However, the variations in performance among different models underscore the need for further refinement and evaluation. Additionally, while this study focused on text-based responses, the visual assessment capabilities of multimodal LLMs remain unexplored. Given the visual nature of radiology, future research should investigate the integration of visual analysis in LLMs to fully harness their potential in clinical settings.
In conclusion, our study underscores the high potential of LLMs to assist radiologists in oncological reporting, providing a consistent and reliable approach to interpreting RECIST 1.1 guidelines. These findings advocate for the continued development and integration of LLMs in radiology to enhance diagnostic accuracy and reporting efficiency.