Abstract
AbstractIntroductionLarge language models perform well on a range of academic tasks including medical examinations. The performance of this class of models in psychopharmacology has not been explored.MethodChat GPT-plus, implementing the GPT-4 large language model, was presented with each of 10 previously-studied antidepressant prescribing vignettes in randomized order, with results regenerated 5 times to evaluate stability of responses. Results were compared to expert consensus.ResultsAt least one of the optimal medication choices was included among the best choices in 38/50 (76%) vignettes: 5/5 for 7 vignettes, 3/5 for 1, and 0/5 for 2. At least one of the poor choice or contraindicated medications was included among the choices considered optimal or good in 24/50 (48%) of vignettes. The model provided as rationale for treatment selection multiple heuristics including avoiding prior unsuccessful medications, avoiding adverse effects based on comorbidities, and generalizing within medication class.ConclusionThe model appeared to identify and apply a number of heuristics commonly applied in psychopharmacologic clinical practice. However, the inclusion of less optimal recommendations indicates that large language models may pose a substantial risk if routinely applied to guide psychopharmacologic treatment without further monitoring.
Publisher
Cold Spring Harbor Laboratory
Cited by
12 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献