Benchmarking Large Language Models in Retrieval-Augmented Generation-Reference-Cited by-同舟云学术

Benchmarking Large Language Models in Retrieval-Augmented Generation

Published:2024-03-24 Issue:16 Volume:38 Page:17754-17762
ISSN:2374-3468
Container-title:Proceedings of the AAAI Conference on Artificial Intelligence
language:
Short-container-title:AAAI

Author:

Chen Jiawei,Lin Hongyu,Han Xianpei,Sun Le

Abstract

Retrieval-Augmented Generation (RAG) is a promising approach for mitigating the hallucination of large language models (LLMs). However, existing research lacks rigorous evaluation of the impact of retrieval-augmented generation on different large language models, which make it challenging to identify the potential bottlenecks in the capabilities of RAG for different LLMs. In this paper, we systematically investigate the impact of Retrieval-Augmented Generation on large language models. We analyze the performance of different large language models in 4 fundamental abilities required for RAG, including noise robustness, negative rejection, information integration, and counterfactual robustness. To this end, we establish Retrieval-Augmented Generation Benchmark (RGB), a new corpus for RAG evaluation in both English and Chinese. RGB divides the instances within the benchmark into 4 separate testbeds based on the aforementioned fundamental abilities required to resolve the case. Then we evaluate 6 representative LLMs on RGB to diagnose the challenges of current LLMs when applying RAG. Evaluation reveals that while LLMs exhibit a certain degree of noise robustness, they still struggle significantly in terms of negative rejection, information integration, and dealing with false information. The aforementioned assessment outcomes indicate that there is still a considerable journey ahead to effectively apply RAG to LLMs.

Publisher

Association for the Advancement of Artificial Intelligence (AAAI)

Cited by 7 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. The influence of ChatGPT on student engagement: A systematic review and future research agenda;Computers & Education;2024-10

2. Towards Retrieval Augmented Generation over Large Video Libraries;2024 16th International Conference on Human System Interaction (HSI);2024-07-08

3. Anchor Your Embeddings Through the Storm: Mitigating Instance-to-Document Semantic Gap;2024 International Joint Conference on Neural Networks (IJCNN);2024-06-30

4. Evaluation of LLM Tools for Feedback Generation in a Course on Concurrent Programming;International Journal of Artificial Intelligence in Education;2024-05-15

5. Introduction to Large Language Models (LLMs) for dementia care and research;Frontiers in Dementia;2024-05-14