Comparing ChatGPT and GPT-4 performance in USMLE soft skill assessments-Reference-Cited by-同舟云学术

Comparing ChatGPT and GPT-4 performance in USMLE soft skill assessments

Published:2023-10-01 Issue:1 Volume:13 Page:
ISSN:2045-2322
Container-title:Scientific Reports
language:en
Short-container-title:Sci Rep

Author:

Brin Dana,Sorin Vera,Vaid Akhil,Soroush Ali,Glicksberg Benjamin S.,Charney Alexander W.,Nadkarni Girish,Klang Eyal

Abstract

AbstractThe United States Medical Licensing Examination (USMLE) has been a subject of performance study for artificial intelligence (AI) models. However, their performance on questions involving USMLE soft skills remains unexplored. This study aimed to evaluate ChatGPT and GPT-4 on USMLE questions involving communication skills, ethics, empathy, and professionalism. We used 80 USMLE-style questions involving soft skills, taken from the USMLE website and the AMBOSS question bank. A follow-up query was used to assess the models’ consistency. The performance of the AI models was compared to that of previous AMBOSS users. GPT-4 outperformed ChatGPT, correctly answering 90% compared to ChatGPT’s 62.5%. GPT-4 showed more confidence, not revising any responses, while ChatGPT modified its original answers 82.5% of the time. The performance of GPT-4 was higher than that of AMBOSS's past users. Both AI models, notably GPT-4, showed capacity for empathy, indicating AI's potential to meet the complex interpersonal, ethical, and professional demands intrinsic to the practice of medicine.

Publisher

Springer Science and Business Media LLC

Subject

Multidisciplinary

Link

https://www.nature.com/articles/s41598-023-43436-9.pdf

Reference17 articles.

1. Jiang, L. Y. et al. Health system-scale language models are all-purpose prediction engines. Nature 619, 357–362 (2023).

2. Liebrenz, M., Schleifer, R., Buadze, A., Bhugra, D. & Smith, A. Generating scholarly content with ChatGPT: Ethical challenges for medical publishing. Lancet Digit. Health 5, e105–e106 (2023).

3. Nazario-Johnson, L., Zaki, H. A. & Tung, G. A. Use of large language models to predict neuroimaging. J. Am. Coll. Radiol. https://doi.org/10.1016/j.jacr.2023.06.008 (2023).

4. Sorin, V., Barash, Y., Konen, E. & Klang, E. Large language models for oncological applications. J. Cancer Res. Clin. Oncol. https://doi.org/10.1007/s00432-023-04824-w (2023).

5. Li, R., Kumar, A. & Chen, J. H. How chatbots and large language model artificial intelligence systems will reshape modern medicine: Fountain of creativity or Pandora’s box?. JAMA Intern. Med. 183, 596 (2023).

Cited by 82 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Beyond the Scalpel: Assessing ChatGPT's potential as an auxiliary intelligent virtual assistant in oral surgery;Computational and Structural Biotechnology Journal;2024-12

2. ChatGPT and neurosurgical education: A crossroads of innovation and opportunity;Journal of Clinical Neuroscience;2024-11

3. Enhancing healthcare with intelligent environments: Integrating medical knowledge into GPT for advanced medical personal chatbots;Journal of Smart Cities and Society;2024-09-05

4. Putting GPT-4o to the Sword: A Comprehensive Evaluation of Language, Vision, Speech, and Multimodal Proficiency;Applied Sciences;2024-09-03

5. The promise and challenges of generative AI in education;Behaviour & Information Technology;2024-09-02