Reasoning with large language models for medical question answering-Reference-Cited by-同舟云学术

Reasoning with large language models for medical question answering

Published:2024-07-03 Issue:9 Volume:31 Page:1964-1975
ISSN:1067-5027
Container-title:Journal of the American Medical Informatics Association
language:en
Short-container-title:

Author:

Lucas Mary M¹^ORCID,Yang Justin²,Pomeroy Jon K¹³,Yang Christopher C¹

Affiliation:

1. College of Computing and Informatics, Drexel University , Philadelphia, PA 19104, United States

2. Department of Computer Science, University of Maryland , College Park, MD 20742, United States

3. Penn Medicine , Philadelphia, PA 19104, United States

Abstract

Abstract Objectives To investigate approaches of reasoning with large language models (LLMs) and to propose a new prompting approach, ensemble reasoning, to improve medical question answering performance with refined reasoning and reduced inconsistency. Materials and Methods We used multiple choice questions from the USMLE Sample Exam question files on 2 closed-source commercial and 1 open-source clinical LLM to evaluate our proposed approach ensemble reasoning. Results On GPT-3.5 turbo and Med42-70B, our proposed ensemble reasoning approach outperformed zero-shot chain-of-thought with self-consistency on Steps 1, 2, and 3 questions (+3.44%, +4.00%, and +2.54%) and (2.3%, 5.00%, and 4.15%), respectively. With GPT-4 turbo, there were mixed results with ensemble reasoning again outperforming zero-shot chain-of-thought with self-consistency on Step 1 questions (+1.15%). In all cases, the results demonstrated improved consistency of responses with our approach. A qualitative analysis of the reasoning from the model demonstrated that the ensemble reasoning approach produces correct and helpful reasoning. Conclusion The proposed iterative ensemble reasoning has the potential to improve the performance of LLMs in medical question answering tasks, particularly with the less powerful LLMs like GPT-3.5 turbo and Med42-70B, which may suggest that this is a promising approach for LLMs with lower capabilities. Additionally, the findings show that our approach helps to refine the reasoning generated by the LLM and thereby improve consistency even with the more powerful GPT-4 turbo. We also identify the potential and need for human-artificial intelligence teaming to improve the reasoning beyond the limits of the model.

Funder

National Science Foundation

Department of Defense

Publisher

Oxford University Press (OUP)

Link

https://academic.oup.com/jamia/article-pdf/31/9/1964/58868063/ocae131.pdf

Reference29 articles.

1. Natural language processing: from bedside to everywhere;Aramaki;Yearbook Med Informat,2022

2. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead;Rudin;Nature Mach Intell,2019

3. The black box problem revisited. Real and imaginary challenges for automated legal decision making;Brożek;Artif Intell Law,2024;32:427-440.

4. Using fine-tuned large language models to parse clinical notes in musculoskeletal pain disorders;Vaid;Lancet Digital Health,2023

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Large language models in biomedicine and health: current research landscape and future directions;Journal of the American Medical Informatics Association;2024-08-22