Suicide Risk Assessments Through the Eyes of ChatGPT-3.5 Versus ChatGPT-4: Vignette Study-Reference-Cited by-同舟云学术

Suicide Risk Assessments Through the Eyes of ChatGPT-3.5 Versus ChatGPT-4: Vignette Study

Published:2023-09-20 Issue: Volume:10 Page:e51232
ISSN:2368-7959
Container-title:JMIR Mental Health
language:en
Short-container-title:JMIR Ment Health

Author:

Levkovich Inbar^ORCID,Elyoseph Zohar^ORCID

Abstract

Background ChatGPT, a linguistic artificial intelligence (AI) model engineered by OpenAI, offers prospective contributions to mental health professionals. Although having significant theoretical implications, ChatGPT’s practical capabilities, particularly regarding suicide prevention, have not yet been substantiated. Objective The study’s aim was to evaluate ChatGPT’s ability to assess suicide risk, taking into consideration 2 discernable factors—perceived burdensomeness and thwarted belongingness—over a 2-month period. In addition, we evaluated whether ChatGPT-4 more accurately evaluated suicide risk than did ChatGPT-3.5. Methods ChatGPT was tasked with assessing a vignette that depicted a hypothetical patient exhibiting differing degrees of perceived burdensomeness and thwarted belongingness. The assessments generated by ChatGPT were subsequently contrasted with standard evaluations rendered by mental health professionals. Using both ChatGPT-3.5 and ChatGPT-4 (May 24, 2023), we executed 3 evaluative procedures in June and July 2023. Our intent was to scrutinize ChatGPT-4’s proficiency in assessing various facets of suicide risk in relation to the evaluative abilities of both mental health professionals and an earlier version of ChatGPT-3.5 (March 14 version). Results During the period of June and July 2023, we found that the likelihood of suicide attempts as evaluated by ChatGPT-4 was similar to the norms of mental health professionals (n=379) under all conditions (average Z score of 0.01). Nonetheless, a pronounced discrepancy was observed regarding the assessments performed by ChatGPT-3.5 (May version), which markedly underestimated the potential for suicide attempts, in comparison to the assessments carried out by the mental health professionals (average Z score of –0.83). The empirical evidence suggests that ChatGPT-4’s evaluation of the incidence of suicidal ideation and psychache was higher than that of the mental health professionals (average Z score of 0.47 and 1.00, respectively). Conversely, the level of resilience as assessed by both ChatGPT-4 and ChatGPT-3.5 (both versions) was observed to be lower in comparison to the assessments offered by mental health professionals (average Z score of –0.89 and –0.90, respectively). Conclusions The findings suggest that ChatGPT-4 estimates the likelihood of suicide attempts in a manner akin to evaluations provided by professionals. In terms of recognizing suicidal ideation, ChatGPT-4 appears to be more precise. However, regarding psychache, there was an observed overestimation by ChatGPT-4, indicating a need for further research. These results have implications regarding ChatGPT-4’s potential to support gatekeepers, patients, and even mental health professionals’ decision-making. Despite the clinical potential, intensive follow-up studies are necessary to establish the use of ChatGPT-4’s capabilities in clinical practice. The finding that ChatGPT-3.5 frequently underestimates suicide risk, especially in severe cases, is particularly troubling. It indicates that ChatGPT may downplay one’s actual suicide risk level.

Publisher

JMIR Publications Inc.

Subject

Psychiatry and Mental health

Reference52 articles.

1. Large language models (LLM) and ChatGPT: what will the impact on nuclear medicine be?

2. ChatGPT, GPT-4, and Other Large Language Models: The Next Revolution for Clinical Microbiology?

3. Was This Title Generated by ChatGPT? Considerations for Artificial Intelligence Text-Generation Software Programs for Chemists and Chemistry Educators

4. Will ChatGPT get you caught? Rethinking of Plagiarism Detection

5. ChatGPT outperforms humans in emotional awareness evaluations

Cited by 33 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Beyond the Scalpel: Assessing ChatGPT's potential as an auxiliary intelligent virtual assistant in oral surgery;Computational and Structural Biotechnology Journal;2024-12

2. Assessing the ChatGPT aptitude: A competent and effective Dermatology doctor?;Heliyon;2024-09

3. Generative Large Language Models in Electronic Health Records for Patient Care Since 2023: A Systematic Review;2024-08-12

4. World Psychiatric Association-Asian Journal of Psychiatry Commission on Public Mental Health;Asian Journal of Psychiatry;2024-08

5. The Role of Artificial Intelligence in the Primary Prevention of Common Musculoskeletal Diseases;Cureus;2024-07-25