Utility and Comparative Performance of Current Artificial Intelligence Large Language Models as Postoperative Medical Support Chatbots in Aesthetic Surgery-Reference-Cited by-同舟云学术

Utility and Comparative Performance of Current Artificial Intelligence Large Language Models as Postoperative Medical Support Chatbots in Aesthetic Surgery

Published:2024-02-06 Issue:8 Volume:44 Page:889-896
ISSN:1090-820X
Container-title:Aesthetic Surgery Journal
language:en
Short-container-title:

Author:

Abi-Rafeh Jad^ORCID,Henry Nader,Xu Hong Hao^ORCID,Bassiri-Tehrani Brian^ORCID,Arezki Adel,Kazan Roy,Gilardino Mirko S,Nahai Foad

Abstract

Abstract Background Large language models (LLMs) have revolutionized the way plastic surgeons and their patients can access and leverage artificial intelligence (AI). Objectives The present study aims to compare the performance of 2 current publicly available and patient-accessible LLMs in the potential application of AI as postoperative medical support chatbots in an aesthetic surgeon's practice. Methods Twenty-two simulated postoperative patient presentations following aesthetic breast plastic surgery were devised and expert-validated. Complications varied in their latency within the postoperative period, as well as urgency of required medical attention. In response to each patient-reported presentation, Open AI's ChatGPT and Google's Bard, in their unmodified and freely available versions, were objectively assessed for their comparative accuracy in generating an appropriate differential diagnosis, most-likely diagnosis, suggested medical disposition, treatments or interventions to begin from home, and/or red flag signs/symptoms indicating deterioration. Results ChatGPT cumulatively and significantly outperformed Bard across all objective assessment metrics examined (66% vs 55%, respectively; P < .05). Accuracy in generating an appropriate differential diagnosis was 61% for ChatGPT vs 57% for Bard (P = .45). ChatGPT asked an average of 9.2 questions on history vs Bard’s 6.8 questions (P < .001), with accuracies of 91% vs 68% reporting the most-likely diagnosis, respectively (P < .01). Appropriate medical dispositions were suggested with accuracies of 50% by ChatGPT vs 41% by Bard (P = .40); appropriate home interventions/treatments with accuracies of 59% vs 55% (P = .94), and red flag signs/symptoms with accuracies of 79% vs 54% (P < .01), respectively. Detailed and comparative performance breakdowns according to complication latency and urgency are presented. Conclusions ChatGPT represents the superior LLM for the potential application of AI technology in postoperative medical support chatbots. Imperfect performance and limitations discussed may guide the necessary refinement to facilitate adoption.

Publisher

Oxford University Press (OUP)

Link

https://academic.oup.com/asj/advance-article-pdf/doi/10.1093/asj/sjae025/58282491/sjae025.pdf

Reference37 articles.

1. Large language models and artificial intelligence: a primer for plastic surgeons on the demonstrated and potential applications, promises, and limitations of ChatGPT;Abi-Rafeh;Aesthet Surg J,2024

2. Exploring the potential of artificial intelligence in surgery: insights from a conversation with ChatGPT;Hassan;Ann Surg Oncol,2023

3. Utilizing ChatGPT-4 for providing medical information on blepharoplasties to patients;Cox;Aesthet Surg J,2023

4. Aesthetic surgery advice and counseling from artificial intelligence: A rhinoplasty consultation with ChatGPT;Xie;Aesthetic Plast Surg,2023

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Large Language Models for Intraoperative Decision Support in Plastic Surgery: A Comparison between ChatGPT-4 and Gemini;Medicina;2024-06-08

2. Artificial Intelligence in Postoperative Care: Assessing Large Language Models for Patient Recommendations in Plastic Surgery;Healthcare;2024-05-24

3. A Prescription for Progress: The Aesthetic Society Welcomes Plastic Surgery Cores and Allied Professionals;Aesthetic Surgery Journal;2024-04-01