Abstract
AbstractWhile Large Language Models (LLMs) have showcased their potential in diverse language tasks, their application in the healthcare arena needs to ensure the minimization of diagnostic errors and the prevention of patient harm. A Medical Knowledge Graph (KG) houses a wealth of structured medical concept relations sourced from authoritative references, such as UMLS, making it a valuable resource to ground LLMs’ diagnostic process in knowledge. In this paper, we examine the synergistic potential of LLMs and medical KG in predicting diagnoses given electronic health records (EHR), under the framework of Retrieval-augmented generation (RAG). We proposed a novel graph model: Dr.Knows, that selects the most relevant pathology knowledge paths based on the medical problem descriptions. In order to evaluate Dr.Knows, we developed the first comprehensive human evaluation approach to assess the performance of LLMs for diagnosis prediction and examine the rationale behind their decision-making processes, aimed at improving diagnostic safety. Using real-world hospital datasets, our study serves to enrich the discourse on the role of medical KGs in grounding medical knowledge into LLMs, revealing both challenges and opportunities in harnessing external knowledge for explainable diagnostic pathway and the realization of AI-augmented diagnostic decision support systems.
Publisher
Cold Spring Harbor Laboratory
Reference54 articles.
1. Asma Ben Abacha , Wen-wai Yim , George Michalopoulos , and Thomas Lin . 2023. An investigation of evaluation metrics for automated medical note generation. arXiv preprint arXiv:2305.17364.
2. Griffin Adams , Emily Alsentzer , Mert Ketenci , Jason Zucker , and Noémie Elhadad . 2021. What’s in a summary? laying the groundwork for advances in hospital-course summarization. In Proceedings of the conference. Association for Computational Linguistics. North American Chapter. Meeting, volume 2021, page 4794. NIH Public Access.
3. Griffin Adams , Jason Zucker , and Noémie Elhadad . 2023. A meta-evaluation of faithfulness metrics for long-form hospital-course summarization. arXiv preprint arXiv:2303.03948.
4. Claudio Aracena , Fabián Villena , Matías Rojas , and Jocelyn Dunstan . 2022. A knowledge-graph-based intrinsic test for benchmarking medical concept embeddings and pretrained language models. In Proceedings of the 13th International Workshop on Health Text Mining and Information Analysis (LOUHI), pages 197–206.
5. Akari Asai , Zeqiu Wu , Yizhong Wang , Avirup Sil , and Hannaneh Hajishirzi . 2023. Self-rag: Learning to retrieve, generate, and critique through self-reflection. arXiv preprint arXiv:2310.11511.
Cited by
4 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献