Utilizing ChatGPT as a scientific reasoning engine to differentiate conflicting evidence and summarize challenges in controversial clinical questions-Reference-Cited by-同舟云学术

Utilizing ChatGPT as a scientific reasoning engine to differentiate conflicting evidence and summarize challenges in controversial clinical questions

Published:2024-05-17 Issue:7 Volume:31 Page:1551-1560
ISSN:1067-5027
Container-title:Journal of the American Medical Informatics Association
language:en
Short-container-title:

Author:

Xie Shiyao¹²,Zhao Wenjing¹²,Deng Guanghui³,He Guohua⁴,He Na⁵,Lu Zhenhua⁶,Hu Weihua⁷,Zhao Mingming⁸,Du Jian¹²^ORCID

Affiliation:

1. Institute of Medical Technology, Peking University Health Science Center , Beijing, 100191, China

2. National Institute of Health Data Science, Peking University , Beijing, 100191, China

3. School of Health Humanities, Peking University , Beijing, 100191, China

4. Department of Pediatric Nephrology and Rheumatology, Sun Yat-sen University First Affiliated Hospital , Guangzhou, Guangdong, 510062, China

5. Department of Pharmacy, Peking University Third Hospital , Beijing, 100089, China

6. Department of Gastrointestinal Cancer Translational Research Laboratory, Peking University Cancer Hospital , Beijing, 100143, China

7. Department of Epidemiology and Biostatistics, School of Public Health, Peking University , Beijing, 100191, China

8. Department of Cardiology and Institute of Vascular Medicine, Peking University Third Hospital , Beijing, 100089, China

Abstract

Abstract Objective Synthesizing and evaluating inconsistent medical evidence is essential in evidence-based medicine. This study aimed to employ ChatGPT as a sophisticated scientific reasoning engine to identify conflicting clinical evidence and summarize unresolved questions to inform further research. Materials and Methods We evaluated ChatGPT’s effectiveness in identifying conflicting evidence and investigated its principles of logical reasoning. An automated framework was developed to generate a PubMed dataset focused on controversial clinical topics. ChatGPT analyzed this dataset to identify consensus and controversy, and to formulate unsolved research questions. Expert evaluations were conducted 1) on the consensus and controversy for factual consistency, comprehensiveness, and potential harm and, 2) on the research questions for relevance, innovation, clarity, and specificity. Results The gpt-4-1106-preview model achieved a 90% recall rate in detecting inconsistent claim pairs within a ternary assertions setup. Notably, without explicit reasoning prompts, ChatGPT provided sound reasoning for the assertions between claims and hypotheses, based on an analysis grounded in relevance, specificity, and certainty. ChatGPT’s conclusions of consensus and controversies in clinical literature were comprehensive and factually consistent. The research questions proposed by ChatGPT received high expert ratings. Discussion Our experiment implies that, in evaluating the relationship between evidence and claims, ChatGPT considered more detailed information beyond a straightforward assessment of sentimental orientation. This ability to process intricate information and conduct scientific reasoning regarding sentiment is noteworthy, particularly as this pattern emerged without explicit guidance or directives in prompts, highlighting ChatGPT’s inherent logical reasoning capabilities. Conclusion This study demonstrated ChatGPT’s capacity to evaluate and interpret scientific claims. Such proficiency can be generalized to broader clinical research literature. ChatGPT effectively aids in facilitating clinical studies by proposing unresolved challenges based on analysis of existing studies. However, caution is advised as ChatGPT’s outputs are inferences drawn from the input literature and could be harmful to clinical practice.

Funder

National Key R&D Program for Young Scientists

National Natural Science Foundation of China

General funding of the China Postdoctoral Science Foundation

Publisher

Oxford University Press (OUP)

Link

https://academic.oup.com/jamia/article-pdf/31/7/1551/58243695/ocae100.pdf

Reference36 articles.

1. Contradicted and initially stronger effects in highly cited clinical research;Ioannidis;JAMA,2005

2. Early extreme contradictory estimates may appear in published research: the Proteus phenomenon in molecular genetics research and randomized trials;Ioannidis;J Clin Epidemiol,2005

3. Ethics of large language models in medicine and medical research;Li;Lancet Digit Health,2023

4. A comprehensive review of randomized clinical trials in three medical journals reveals 396 medical reversals;Herrera-Perez;Elife,2019

5. Meta-research: why research on research matters;Ioannidis;PLoS Biol,2018

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Evaluating local open-source large language models for data extraction from unstructured reports on mechanical thrombectomy in patients with ischemic stroke;Journal of NeuroInterventional Surgery;2024-08-02

2. The emerging paradigm in pediatric rheumatology: harnessing the power of artificial intelligence;Rheumatology International;2024-07-16