Utilizing BERT for Information Retrieval: Survey, Applications, Resources, and Challenges-Reference-Cited by-同舟云学术

Utilizing BERT for Information Retrieval: Survey, Applications, Resources, and Challenges

Published:2024-04-09 Issue:7 Volume:56 Page:1-33
ISSN:0360-0300
Container-title:ACM Computing Surveys
language:en
Short-container-title:ACM Comput. Surv.

Author:

Wang Jiajia¹^ORCID,Huang Jimmy Xiangji²^ORCID,Tu Xinhui³^ORCID,Wang Junmei⁴^ORCID,Huang Angela Jennifer⁵^ORCID,Laskar Md Tahmid Rahman⁶^ORCID,Bhuiyan Amran²^ORCID

Affiliation:

1. School of Sciences, Henan University of Technology, Zhengzhou, China

2. Information Retrieval and Knowledge Management Research Lab, York University, Toronto, Canada

3. School of Computer Science, Central China Normal University, Wuhan, China

4. School of Computer, Hangzhou Dianzi University, Hangzhou, China

5. Lassonde School of Engineering, York University, Toronto, Canada

6. York University & Dialpad Inc., Toronto, Canada

Abstract

Recent years have witnessed a substantial increase in the use of deep learning to solve various natural language processing (NLP) problems. Early deep learning models were constrained by their sequential or unidirectional nature, such that they struggled to capture the contextual relationships across text inputs. The introduction of bidirectional encoder representations from transformers (BERT) leads to a robust encoder for the transformer model that can understand the broader context and deliver state-of-the-art performance across various NLP tasks. This has inspired researchers and practitioners to apply BERT to practical problems, such as information retrieval (IR). A survey that focuses on a comprehensive analysis of prevalent approaches that apply pretrained transformer encoders like BERT to IR can thus be useful for academia and the industry. In light of this, we revisit a variety of BERT-based methods in this survey, cover a wide range of techniques of IR, and group them into six high-level categories: (i) handling long documents, (ii) integrating semantic information, (iii) balancing effectiveness and efficiency, (iv) predicting the weights of terms, (v) query expansion, and (vi) document expansion. We also provide links to resources, including datasets and toolkits, for BERT-based IR systems. Additionally, we highlight the advantages of employing encoder-based BERT models in contrast to recent large language models like ChatGPT, which are decoder-based and demand extensive computational resources. Finally, we summarize the comprehensive outcomes of the survey and suggest directions for future research in the area.

Funder

Natural Science and Engineering Research Council (NSERC) of Canada

York Research Chairs

Ontario Research Fund-Research Excellence

Publisher

Association for Computing Machinery (ACM)

Link

https://dl.acm.org/doi/pdf/10.1145/3648471

Reference155 articles.

1. A. Abolghasemi, S. Verberne, and L. Azzopardi. 2022. Improving BERT-based query-by-document retrieval with multi-task optimization. In European Conference on Information Retrieval. Springer, Cham.

2. A. Babashzadeh, M. Daoud, and J. X. Huang. 2023. Using semantic-based association rule mining for improving clinical text retrieval. In Health Information Science: Second International Conference, HIS 2013. Springer, Berlin, 2013.

3. Working memory;Baddeley A.;Science,1992

4. Palm: Scaling language modeling with pathways;Chowdhery A.;J. Mach. Learn. Res.,2023

5. Framewise phoneme classification with bidirectional lstm and other neural network architectures;Graves A.;Neural Netw.,2005

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Adaption BERT for Medical Information Processing with ChatGPT and Contrastive Learning;Electronics;2024-06-21

2. DanXe: An extended artificial intelligence framework to analyze and promote dance heritage;Digital Applications in Archaeology and Cultural Heritage;2024-06

3. On Embedding Implementations in Text Ranking and Classification Employing Graphs;Electronics;2024-05-12