Clinical decision support for bipolar depression using large language models-Reference-Cited by-同舟云学术

Clinical decision support for bipolar depression using large language models

Published:2024-03-13 Issue:9 Volume:49 Page:1412-1416
ISSN:0893-133X
Container-title:Neuropsychopharmacology
language:en
Short-container-title:Neuropsychopharmacol.

Author:

Perlis Roy H.,Goldberg Joseph F.,Ostacher Michael J.^ORCID,Schneck Christopher D.

Abstract

AbstractManagement of depressive episodes in bipolar disorder remains challenging for clinicians despite the availability of treatment guidelines. In other contexts, large language models have yielded promising results for supporting clinical decisionmaking. We developed 50 sets of clinical vignettes reflecting bipolar depression and presented them to experts in bipolar disorder, who were asked to identify 5 optimal next-step pharmacotherapies and 5 poor or contraindicated choices. The same vignettes were then presented to a large language model (GPT4-turbo; gpt-4-1106-preview), with or without augmentation by prompting with recent bipolar treatment guidelines, and asked to identify the optimal next-step pharmacotherapy. Overlap between model output and gold standard was estimated. The augmented model prioritized the expert-designated optimal choice for 508/1000 vignettes (50.8%, 95% CI 47.7–53.9%; Cohen’s kappa = 0.31, 95% CI 0.28–0.35). For 120 vignettes (12.0%), at least one model choice was among the poor or contraindicated treatments. Results were not meaningfully different when gender or race of the vignette was permuted to examine risk for bias. By comparison, an un-augmented model identified the optimal treatment for 234 (23.0%, 95% CI 20.8–26.0%; McNemar’s p < 0.001 versus augmented model) of the vignettes. A sample of community clinicians scoring the same vignettes identified the optimal choice for 23.1% (95% CI 15.7–30.5%) of vignettes, on average; McNemar’s p < 0.001 versus augmented model. Large language models prompted with evidence-based guidelines represent a promising, scalable strategy for clinical decision support. In addition to prospective studies of efficacy, strategies to avoid clinician overreliance on such models, and address the possibility of bias, will be needed.

Funder

U.S. Department of Health & Human Services | NIH | National Institute of Mental Health

Publisher

Springer Science and Business Media LLC

Link

https://www.nature.com/articles/s41386-024-01841-2.pdf

Reference21 articles.

1. Biazus TB, Beraldi GH, Tokeshi L, Rotenberg LdeS, Dragioti E, Carvalho AF, et al. All-cause and cause-specific mortality among people with bipolar disorder: a large-scale systematic review and meta-analysis. Mol Psychiatry. 2023;28:2508–24.

2. Gitlin MJ. Antidepressants in bipolar depression: an enduring controversy. Int J Bipolar Disord. 2018;6:25.

3. Pacchiarotti I, Bond DJ, Baldessarini RJ, Nolen WA, Grunze H, Licht RW, et al. The International Society for Bipolar Disorders (ISBD) task force report on antidepressant use in bipolar disorders. Am J Psychiatry. 2013;170:1249–62.

4. Goldberg JF, Freeman MP, Bacon R, Citrome L, Thase ME, Kane JM, et al. The American Society of Clinical Psychopharmacology survery of psychopharmacologists’ practice patterns for the treatment of mood disorders. Depress Anxiety. 2015;32:605–13.

5. Sakurai H, Kato M, Yasui-Furukori N, Suzuki T, Baba H, Watanabe K, et al. Pharmacological management of bipolar disorder: Japanese expert consensus. Bipolar Disord. 2020;22:822–30.

Cited by 4 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Generating a Benzodiazepine Taper Using a Large Language Model: Feasibility Study (Preprint);2024-08-19

2. Multimodal Large Language Model Passes Specialty Board Examination and Surpasses Human Test-Taker Scores: A Comparative Analysis Examining the Stepwise Impact of Model Prompting Strategies on Performance;2024-07-29

3. Opportunities and risks of large language models in psychiatry;NPP—Digital Psychiatry and Neuroscience;2024-05-24

4. Exploring the Efficacy and Potential of Large Language Models for Depression: A Systematic Review;2024-05-07