Use of Retrieval-Augmented Large Language Model for COVID-19 Fact-Checking (Preprint)

Author:

Li Hai,Huang JingyiORCID,Ji MengmengORCID,Yang YuyiORCID,An RuopengORCID

Abstract

BACKGROUND

The COVID-19 pandemic has been accompanied by an “infodemic,” where the rapid spread of misinformation has exacerbated public health challenges. Traditional fact-checking methods, though effective, are time-consuming and resource-intensive, limiting their ability to combat misinformation at scale. Large language models (LLMs) like GPT-4 offer a more scalable solution, but their susceptibility to generating hallucinations—plausible yet incorrect information—compromises their reliability.

OBJECTIVE

This study aims to enhance the accuracy and reliability of COVID-19 fact-checking by integrating a retrieval-augmented generation (RAG) system with LLMs, specifically addressing the limitations of hallucination and context inaccuracy inherent in standalone LLMs.

METHODS

We constructed a context dataset comprising approximately 130,000 peer-reviewed articles related to COVID-19 from PubMed and Scopus. This dataset was integrated with GPT-4 to develop multiple RAG-enhanced models: the Naive RAG, LOTR-RAG, CRAG (Corrective RAG), and SRAG (Self-RAG). The RAG systems were designed to retrieve relevant external information, which was then embedded and indexed in a vector store for similarity searches. Two datasets, one real-world and one synthesized, each containing 500 claims, were used to evaluate the performance of these models. Each model’s accuracy, F1 score, precision, and sensitivity were compared to assess their effectiveness in reducing hallucinations and improving fact-checking accuracy.

RESULTS

The baseline GPT-4 model achieved an accuracy of 0.856 on the real-world dataset. The Naive RAG model improved this to 0.946, while the LOTR-RAG model further increased accuracy to 0.951. The CRAG and SRAG models outperformed all others, achieving accuracies of 0.972 and 0.973, respectively. The baseline GPT-4 model reached an accuracy of 0.960 on the synthesized dataset. The Naive RAG model increased this to 0.972, and the LOTR-RAG, CRAG, and SRAG models achieved an accuracy of 0.978. These findings demonstrate that the RAG-enhanced models consistently maintained high accuracy levels, closely mirroring ground-truth labels and significantly reducing hallucinations. The CRAG and SRAG models also provided more detailed and contextually accurate explanations, further establishing the superiority of agentic RAG frameworks in delivering reliable and precise fact-checking outputs across diverse datasets.

CONCLUSIONS

The integration of RAG systems with LLMs offers a robust approach to enhancing the accuracy and contextual relevance of automated fact-checking, making it a valuable tool in combating misinformation during public health crises. While the results are promising, the study acknowledges certain limitations, including the dependence on the quality of the external datasets and the potential biases in the retrieved documents. Future research should focus on refining these models, exploring their applicability across various domains, and ensuring ethical deployment. Policy implications include the need for guidelines and frameworks supporting AI-driven fact-checking systems while maintaining transparency, accuracy, and public trust.

Publisher

JMIR Publications Inc.

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3