CAME: Competitively Learning a Mixture-of-Experts Model for First-stage Retrieval-Reference-Cited by-同舟云学术

CAME: Competitively Learning a Mixture-of-Experts Model for First-stage Retrieval

Published:2024-07-22 Issue: Volume: Page:
ISSN:1046-8188
Container-title:ACM Transactions on Information Systems
language:en
Short-container-title:ACM Trans. Inf. Syst.

Author:

Guo Jiafeng¹^ORCID,Cai Yinqiong¹^ORCID,Bi Keping¹^ORCID,Fan Yixing¹^ORCID,Chen Wei¹^ORCID,Zhang Ruqing¹^ORCID,Cheng Xueqi¹^ORCID

Affiliation:

1. Institute of Computing Technology, Chinese Academy of Sciences; University of Chinese Academy of Sciences, China

Abstract

The first-stage retrieval aims to retrieve a subset of candidate documents from a huge collection both effectively and efficiently. Since various matching patterns can exist between queries and relevant documents, previous work tries to combine multiple retrieval models to find as many relevant results as possible. The constructed ensembles, whether learned independently or jointly, do not care which component model is more suitable to an instance during training. Thus, they cannot fully exploit the capabilities of different types of retrieval models in identifying diverse relevance patterns. Motivated by this observation, in this paper, we propose a Mixture-of-Experts (MoE) model consisting of representative matching experts and a novel competitive learning mechanism to let the experts develop and enhance their expertise during training. Specifically, our MoE model shares the bottom layers to learn common semantic representations and uses differently structured upper layers to represent various types of retrieval experts. Our competitive learning mechanism has two stages: (1) a standardized learning stage to train the experts equally to develop their capabilities to conduct relevance matching; (2) a specialized learning stage where the experts compete with each other on every training instance and get rewards and updates according to their performance to enhance their expertise on certain types of samples. Experimental results on retrieval benchmark datasets show that our method significantly outperforms the state-of-the-art baselines in the in-domain and out-of-domain settings.

Publisher

Association for Computing Machinery (ACM)

Link

https://dl.acm.org/doi/pdf/10.1145/3678880

Reference71 articles.

1. Shallow pooling for sparse labels

2. Yang Bai, Xiaoguang Li, Gang Wang, Chaoliang Zhang, Lifeng Shang, Jun Xu, Zhaowei Wang, Fangshan Wang, and Qun Liu. 2020. SparTerm: Learning term-based sparse representation for fast text retrieval. arXiv preprint arXiv:2010.00768 (2020).

3. Jane Bromley, Isabelle Guyon, Yann LeCun, Eduard Säckinger, and Roopak Shah. 1993. Signature verification using a” siamese” time delay neural network. Advances in neural information processing systems 6 (1993).

4. From ranknet to lambdarank to lambdamart: An overview;Burges Christopher JC;Learning,2010

5. Hard Negatives or False Negatives