Affiliation:
1. University of Minnesota
2. Yale University
3. Temple University
4. Weill Cornell Medicine
Abstract
Abstract
Purpose:
Large Language Models (LLMs) have shown exceptional performance in various natural language processing tasks, benefiting from their language generation capabilities and ability to acquire knowledge from unstructured text. However, in the biomedical domain, LLMs face limitations that lead to inaccurate and inconsistent answers. Knowledge Graphs (KGs) have emerged as valuable resources for organizing structured information. Biomedical Knowledge Graphs (BKGs) have gained significant attention for managing diverse and large-scale biomedical knowledge. The objective of this study is to assess and compare the capabilities of ChatGPT and existing BKGs in question-answering, biomedical knowledge discovery, and reasoning tasks within the biomedical domain.
Methods:
We conducted a series of experiments to assess the performance of ChatGPT and the BKGs in various aspects of querying existing biomedical knowledge, knowledge discovery, and knowledge reasoning. Firstly, we tasked ChatGPT with answering questions sourced from the "Alternative Medicine" sub-category of Yahoo! Answers and recorded the responses. Additionally, we queried BKG to retrieve the relevant knowledge records corresponding to the questions and assessed them manually. In another experiment, we formulated a prediction scenario to assess ChatGPT's ability to suggest potential drug/dietary supplement repurposing candidates. Simultaneously, we utilized BKG to perform link prediction for the same task. The outcomes of ChatGPT and BKG were compared and analyzed. Furthermore, we evaluated ChatGPT and BKG's capabilities in establishing associations between pairs of proposed entities. This evaluation aimed to assess their reasoning abilities and the extent to which they can infer connections within the knowledge domain.
Results:
The results indicate that ChatGPT with GPT-4.0 outperforms both GPT-3.5 and BKGs in providing existing information. However, BKGs demonstrate higher reliability in terms of information accuracy. ChatGPT exhibits limitations in performing novel discoveries and reasoning, particularly in establishing structured links between entities compared to BKGs.
Conclusions:
To address the limitations observed, future research should focus on integrating LLMs and BKGs to leverage the strengths of both approaches. Such integration would optimize task performance and mitigate potential risks, leading to advancements in knowledge within the biomedical field and contributing to the overall well-being of individuals.
Publisher
Research Square Platform LLC
Reference73 articles.
1. 1. Fan, L. et al. A Bibliometric Review of Large Language Models Research from 2017 to 2023. (2023) doi:10.48550/ARXIV.2304.02020.
2. 2. De Angelis, L. et al. ChatGPT and the rise of large language models: the new AI-driven infodemic threat in public health. Front. Public Health 11, 1166120 (2023).
3. 3. Birhane, A., Kasirzadeh, A., Leslie, D. & Wachter, S. Science in the age of large language models. Nat. Rev. Phys. 5, 277–280 (2023).
4. 4. Brown, T. et al. Language Models are Few-Shot Learners. in Advances in Neural Information Processing Systems (eds. Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M. F. & Lin, H.) vol. 33 1877–1901 (Curran Associates, Inc., 2020).
5. 5. Nicholson, D. N. & Greene, C. S. Constructing knowledge graphs and their biomedical applications. Comput. Struct. Biotechnol. J. 18, 1414–1428 (2020).
Cited by
5 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献