Comparing the performance of ChatGPT GPT‐4, Bard, and Llama‐2 in the Taiwan Psychiatric Licensing Examination and in differential diagnosis with multi‐center psychiatrists

Author:

Li Dian‐Jeng12ORCID,Kao Yu‐Chen34,Tsai Shih‐Jen56ORCID,Bai Ya‐Mei567ORCID,Yeh Ta‐Chuan3ORCID,Chu Che‐Sheng891011ORCID,Hsu Chih‐Wei12ORCID,Cheng Szu‐Wei1314ORCID,Hsu Tien‐Wei1516ORCID,Liang Chih‐Sung34ORCID,Su Kuan‐Pin141718ORCID

Affiliation:

1. Department of Addiction Science Kaohsiung Municipal Kai‐Syuan Psychiatric Hospital Kaohsiung Taiwan

2. Department of Nursing Meiho University Pingtung Taiwan

3. Department of Psychiatry Tri‐Service General Hospital, National Defense Medical Center Taipei Taiwan

4. Department of Psychiatry Tri‐Service General Hospital, Beitou branch Taipei Taiwan

5. Department of Psychiatry Taipei Veterans General Hospital Taipei Taiwan

6. Department of Psychiatry, College of Medicine National Yang Ming Chiao Tung University Taipei Taiwan

7. Institute of Brain Science National Yang Ming Chiao Tung University Taipei Taiwan

8. Center for Geriatric and Gerontology Kaohsiung Veterans General Hospital Kaohsiung Taiwan

9. Non‐invasive Neuromodulation Consortium for Mental Disorders Society of Psychophysiology Taipei Taiwan

10. Graduate Institute of Medicine, College of Medicine Kaohsiung Medical University Kaohsiung Taiwan

11. Department of Psychiatry Kaohsiung Veterans General Hospital Kaohsiung Taiwan

12. Department of Psychiatry Kaohsiung Chang Gung Memorial Hospital Kaohsiung Taiwan

13. Department of General Medicine Chi Mei Medical Center Tainan Taiwan

14. Mind‐Body Interface Laboratory (MBI‐Lab) and Department of Psychiatry China Medical University Hospital Taichung Taiwan

15. Department of Psychiatry E‐DA Dachang Hospital, I‐Shou University Kaohsiung Taiwan

16. Department of Psychiatry E‐DA Hospital, I‐Shou University Kaohsiung Taiwan

17. College of Medicine China Medical University Taichung Taiwan

18. An‐Nan Hospital China Medical University Tainan Taiwan

Abstract

AimLarge language models (LLMs) have been suggested to play a role in medical education and medical practice. However, the potential of their application in the psychiatric domain has not been well‐studied.MethodIn the first step, we compared the performance of ChatGPT GPT‐4, Bard, and Llama‐2 in the 2022 Taiwan Psychiatric Licensing Examination conducted in traditional Mandarin. In the second step, we compared the scores of these three LLMs with those of 24 experienced psychiatrists in 10 advanced clinical scenario questions designed for psychiatric differential diagnosis.ResultOnly GPT‐4 passed the 2022 Taiwan Psychiatric Licensing Examination (scoring 69 and ≥ 60 being considered a passing grade), while Bard scored 36 and Llama‐2 scored 25. GPT‐4 outperformed Bard and Llama‐2, especially in the areas of ‘Pathophysiology & Epidemiology’ (χ2 = 22.4, P < 0.001) and ‘Psychopharmacology & Other therapies’ (χ2 = 15.8, P < 0.001). In the differential diagnosis, the mean score of the 24 experienced psychiatrists (mean 6.1, standard deviation 1.9) was higher than that of GPT‐4 (5), Bard (3), and Llama‐2 (1).ConclusionCompared to Bard and Llama‐2, GPT‐4 demonstrated superior abilities in identifying psychiatric symptoms and making clinical judgments. Besides, GPT‐4's ability for differential diagnosis closely approached that of the experienced psychiatrists. GPT‐4 revealed a promising potential as a valuable tool in psychiatric practice among the three LLMs.

Publisher

Wiley

Reference27 articles.

1. Summary of ChatGPT-Related research and perspective towards the future of large language models

2. ChatGPT utility in healthcare education, research, and practice: Systematic review on the promising perspectives and valid concerns;Sallam M;Healthcare (Basel),2023

3. Large language models in medicine

4. Q‐pain: A question answering dataset to measure social bias in pain management;Logé C;arXiv preprint arXiv:2108.01764,2021

5. Question-driven summarization of answers to consumer health questions

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3