Comparative Analysis of Large Language Models against the NHS 111 Online Triaging for Emergency Ophthalmology

Author:

Khan Shaheryar1ORCID,Gunasekera Chrishan2

Affiliation:

1. Colchester General Hospital

2. Norfolk & Norwich University Hospital

Abstract

Abstract

Background This study presents a comprehensive evaluation of the performance of various language models in generating responses for ophthalmology emergencies and compares their accuracy with the established NHS 111 online Triage system.Methods We included 21 ophthalmology related emergency scenario questions from the 111 triaging algorithm. These questions were based on four different ophthalmology emergency themes as laid out in the NHS 111 algorithm. The responses generated from NHS 111 online, were compared to the different LLM-chatbots responses. We included a range of models including ChatGPT-3.5, Google Bard, Bing Chat, and ChatGPT-4.0. The accuracy of each LLM-chatbot response was compared against the NHS 111 Triage using a two prompt strategy. Answers were graded separately by two different authors as following: −2 graded as “Very poor”, -1 as “Poor”, 0 as “No response”, 1 as “Good”, 2 as “Very good” and 3 graded as “Excellent”.Results Overall score of ≥ 1 graded as “Good” or better was achieved by 93% of responses of all LLMs. This refers to at least part of the answer having correct information and partially matching NHS 111 response, as well as the absence of any wrong information or advice which is potentially harmful to the patient’s health.Conclusions The high accuracy and safety observed in LLM responses support their potential as effective tools for providing timely information and guidance to patients. While further research is warranted to validate these findings in clinical practice, LLMs hold promise in enhancing patient care and healthcare accessibility in the digital age.

Publisher

Springer Science and Business Media LLC

Reference17 articles.

1. The potential of artificial intelligence to improve patient safety: a scoping review;Bates DW;NPJ Digit Med,2021

2. Artificial intelligence in health care: A report from the national academy of medicine;Matheny ME;JAMA,2020

3. International publication trends in the application of artificial intelligence in ophthalmology research: An updated bibliometric analysis;Jiang X;Ann Transl Med,2023

4. OpenAI ChatGPT (Mar 13 version) [Large language model] Available at: https://openai.com/blog/chatgpt [Accessed Aug 13, 2023]

5. Bard, an experiment by Google (Mar 21 version). Available at: https://bard.google.com/. [Accessed August 13, 2023]

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3