FIT-RAG: Black-Box RAG with Factual Information and Token Reduction

Author:

Mao Yuren1ORCID,Dong Xuemei1ORCID,Xu Wenyi1ORCID,Gao Yunjun1ORCID,Wei Bin1ORCID,Zhang Ying2ORCID

Affiliation:

1. Zhejiang University, China

2. Zhejiang Gongshang University, China

Abstract

Due to the extraordinarily large number of parameters, fine-tuning Large Language Models (LLMs) to update long-tail or out-of-date knowledge is impractical in lots of applications. To avoid fine-tuning, we can alternatively treat a LLM as a black-box (i.e., freeze the parameters of the LLM) and augment it with a Retrieval-Augmented Generation (RAG) system, namely black-box RAG. Recently, black-box RAG has achieved success in knowledge-intensive tasks and has gained much attention. Existing black-box RAG methods typically fine-tune the retriever to cater to LLMs’ preferences and concatenate all the retrieved documents as the input, which suffers from two issues: (1) Ignorance of Factual Information. The LLM preferred documents may not contain the factual information for the given question, which can mislead the retriever and hurt the effectiveness of black-box RAG; (2) Waste of Tokens. Simply concatenating all the retrieved documents brings large amounts of unnecessary tokens for LLMs, which degenerates the efficiency of black-box RAG. To address these issues, this paper proposes a novel black-box RAG framework which utilizes the factual information in the retrieval and reduces the number of tokens for augmentation, dubbed FIT-RAG. FIT-RAG utilizes the factual information by constructing a bi-label document scorer which takes the factual information and LLMs’ preferences as labels respectively. Besides, it reduces the tokens by introducing a self-knowledge recognizer and a sub-document-level token reducer, which enables FIT-RAG to avoid unnecessary augmentation and reduce augmentation tokens as much as possible. FIT-RAG achieves both superior effectiveness and efficiency, which is validated by extensive experiments across three open-domain question-answering datasets: TriviaQA, NQ and PopQA. FIT-RAG can improve the answering accuracy of Llama2-13B-Chat by 14.3% on TriviaQA, 19.9% on NQ and 27.5% on PopQA, respectively. Furthermore, it can save approximately half of the tokens on average across the three datasets.

Publisher

Association for Computing Machinery (ACM)

Reference68 articles.

1. Luís B Almeida, Thibault Langlois, José D Amaral, and Alexander Plakhov. 1998. Parameter adaptation in stochastic optimization. In On-Line Learning in Neural Networks. Cambridge University Press, 111–134.

2. Atilim Gunes Baydin Robert Cornish David Martínez-Rubio Mark Schmidt and Frank Wood. 2018. Online Learning Rate Adaptation with Hypergradient Descent. In ICLR.

3. Paul N Bennett Ryen W White Wei Chu Susan T Dumais Peter Bailey Fedor Borisyuk and Xiaoyuan Cui. 2012. Modeling the impact of short-and long-term behavior on search personalization. In SIGIR.

4. Sebastian Borgeaud, Arthur Mensch, Jordan Hoffmann, Trevor Cai, Eliza Rutherford, Katie Millican, George Bm Van Den Driessche, Jean-Baptiste Lespiau, Bogdan Damoc, Aidan Clark, et al. 2022. Improving language models by retrieving from trillions of tokens. In ICML.

5. Tom Brown Benjamin Mann Nick Ryder Melanie Subbiah Jared D Kaplan Prafulla Dhariwal Arvind Neelakantan Pranav Shyam Girish Sastry Amanda Askell et al. 2020. Language models are few-shot learners. In NeurIPS.

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3