FIT-RAG: Black-Box RAG with Factual Information and Token Reduction-Reference-Cited by-同舟云学术

FIT-RAG: Black-Box RAG with Factual Information and Token Reduction

Published:2024-07-09 Issue: Volume: Page:
ISSN:1046-8188
Container-title:ACM Transactions on Information Systems
language:en
Short-container-title:ACM Trans. Inf. Syst.

Author:

Mao Yuren¹^ORCID,Dong Xuemei¹^ORCID,Xu Wenyi¹^ORCID,Gao Yunjun¹^ORCID,Wei Bin¹^ORCID,Zhang Ying²^ORCID

Affiliation:

1. Zhejiang University, China

2. Zhejiang Gongshang University, China

Abstract

Due to the extraordinarily large number of parameters, fine-tuning Large Language Models (LLMs) to update long-tail or out-of-date knowledge is impractical in lots of applications. To avoid fine-tuning, we can alternatively treat a LLM as a black-box (i.e., freeze the parameters of the LLM) and augment it with a Retrieval-Augmented Generation (RAG) system, namely black-box RAG. Recently, black-box RAG has achieved success in knowledge-intensive tasks and has gained much attention. Existing black-box RAG methods typically fine-tune the retriever to cater to LLMs’ preferences and concatenate all the retrieved documents as the input, which suffers from two issues: (1) Ignorance of Factual Information. The LLM preferred documents may not contain the factual information for the given question, which can mislead the retriever and hurt the effectiveness of black-box RAG; (2) Waste of Tokens. Simply concatenating all the retrieved documents brings large amounts of unnecessary tokens for LLMs, which degenerates the efficiency of black-box RAG. To address these issues, this paper proposes a novel black-box RAG framework which utilizes the factual information in the retrieval and reduces the number of tokens for augmentation, dubbed FIT-RAG. FIT-RAG utilizes the factual information by constructing a bi-label document scorer which takes the factual information and LLMs’ preferences as labels respectively. Besides, it reduces the tokens by introducing a self-knowledge recognizer and a sub-document-level token reducer, which enables FIT-RAG to avoid unnecessary augmentation and reduce augmentation tokens as much as possible. FIT-RAG achieves both superior effectiveness and efficiency, which is validated by extensive experiments across three open-domain question-answering datasets: TriviaQA, NQ and PopQA. FIT-RAG can improve the answering accuracy of Llama2-13B-Chat by 14.3% on TriviaQA, 19.9% on NQ and 27.5% on PopQA, respectively. Furthermore, it can save approximately half of the tokens on average across the three datasets.

Publisher

Association for Computing Machinery (ACM)

Link

https://dl.acm.org/doi/pdf/10.1145/3676957

Reference68 articles.

1. Luís B Almeida, Thibault Langlois, José D Amaral, and Alexander Plakhov. 1998. Parameter adaptation in stochastic optimization. In On-Line Learning in Neural Networks. Cambridge University Press, 111–134.

2. Atilim Gunes Baydin Robert Cornish David Martínez-Rubio Mark Schmidt and Frank Wood. 2018. Online Learning Rate Adaptation with Hypergradient Descent. In ICLR.

3. Paul N Bennett Ryen W White Wei Chu Susan T Dumais Peter Bailey Fedor Borisyuk and Xiaoyuan Cui. 2012. Modeling the impact of short-and long-term behavior on search personalization. In SIGIR.

4. Sebastian Borgeaud, Arthur Mensch, Jordan Hoffmann, Trevor Cai, Eliza Rutherford, Katie Millican, George Bm Van Den Driessche, Jean-Baptiste Lespiau, Bogdan Damoc, Aidan Clark, et al. 2022. Improving language models by retrieving from trillions of tokens. In ICML.

5. Tom Brown Benjamin Mann Nick Ryder Melanie Subbiah Jared D Kaplan Prafulla Dhariwal Arvind Neelakantan Pranav Shyam Girish Sastry Amanda Askell et al. 2020. Language models are few-shot learners. In NeurIPS.