Performance of ChatGPT on basic healthcare leadership and management questions-Reference-Cited by-同舟云学术

Performance of ChatGPT on basic healthcare leadership and management questions

Published:2024-08-01 Issue: Volume: Page:
ISSN:2190-7188
Container-title:Health and Technology
language:en
Short-container-title:Health Technol.

Author:

Leutz-Schmidt Patricia,Grözinger Martin,Kauczor Hans-Ulrich,Jang Hyungseok,Sedaghat Sam^ORCID

Abstract

Abstract Purpose ChatGPT is an LLM-based chatbot introduced in 2022. This study investigates the performance of ChatGPT-3.5 and ChatGPT-4 on basic healthcare leadership and management questions. Methods ChatGPT-3.5 and -4 (OpenAI, San Francisco, CA, USA) generated answers to 24 pre-selected questions on three different areas of management and leadership in medical practice: group 1) accessing management/leadership training, group 2) management/leadership basics, group 3) department management/leadership. Three readers independently evaluated the answers provided by the two versions of ChatGPT. Three 4-digit scores were developed to assess the quality of the responses: 1) overall quality score (OQS), 2) understandibility score (US), and 3) implementability score (IS). The mean quality score (MQS) was calculated from these three scores. Results The interrater agreement was good for ChatGPT-4 (72%) and moderate for ChatGPT-3.5 (56%). The MQS of all questions reached a mean score of 3,42 (SD: 0,64) using ChatGPT-3.5 and 3,75 (SD: 0,47) using ChatGPT-4. ChatGPT-4 showed significantly higher MQS scores in group 2 and 3 questions than ChatGPT-3.5 (p = 0.039 and p < 0.001, respectively). Also, significant differences between ChatGPT-3.5 and ChatGPT-4 regarding OQS, US, and IS in group 3 questions were seen with significances reaching p < 0.001. Significant differences between the two chatbot versions were also present regarding OQS in question groups 1 and 2 (p = 0.035 each). 87.5% of the answers provided by ChatGPT-4 (21 of 24 answers) were considered superior to the answers provided by ChatGPT-3.5 for the same questions. Neither ChatGPT-3.5 nor ChatGPT-4 offered any inaccurate answers. Conclusion ChatGPT-3.5 and ChatGPT-4 performed well on basic healthcare leadership and management questions, while ChatGPT-4 was superior.

Funder

Universitätsklinikum Heidelberg

Publisher

Springer Science and Business Media LLC

Link

https://link.springer.com/content/pdf/10.1007/s12553-024-00897-w.pdf

Reference27 articles.

1. Pavli A, Theodoridou M, Maltezou HC. Post-COVID Syndrome: Incidence, Clinical Spectrum, and Challenges for Primary Healthcare Professionals. Arch Med Res. 2021;52(6):575–81. https://doi.org/10.1016/j.arcmed.2021.03.010.

2. Shaheen MY. Applications of Artificial Intelligence (AI) in healthcare: A review. ScienceOpen Preprints 2021. https://doi.org/10.14293/S2199-1006.1.SOR-.PPVRY8K.v1

3. van der Schaar M, Alaa AM, Floto A, Gimson A, Scholtes S, Wood A, et al. How artificial intelligence and machine learning can help healthcare systems respond to COVID-19. Mach Learn. 2021;110(1):1–14. https://doi.org/10.1007/s10994-020-05928-x.

4. Liebrenz M, Schleifer R, Buadze A, Bhugra D, Smith A. Generating scholarly content with ChatGPT: ethical challenges for medical publishing. Lancet Digit Health. 2023;5(3):e105–6. https://doi.org/10.1016/S2589-7500(23)00019-5.

5. Sedaghat S. Success Through Simplicity: What Other Artificial Intelligence Applications in Medicine Should Learn from History and ChatGPT. Ann Biomed Eng. 2023. https://doi.org/10.1007/s10439-023-03287-x.