Affiliation:
1. College of Computer and Information Engineering Hubei Normal University Huangshi China
2. School of Information Technology York University Toronto Canada
3. Lassonde School of Engineering York University Toronto Ontario Canada
Abstract
AbstractThe pre‐trained language model (PLM) based on the Transformer encoder, namely BERT, has achieved state‐of‐the‐art results in the field of Information Retrieval. Existing BERT‐based ranking models divide documents into passages and aggregate passage‐level relevance to rank the document list. However, these common score aggregation strategies cannot capture important semantic information such as document structure and have not been extensively studied. In this article, we propose a novel kernel‐based score pooling system to capture document‐level relevance by aggregating passage‐level relevance. In particular, we propose and study several representative kernel pooling functions and several different document ranking strategies based on passage‐level relevance. Our proposed framework KnBERT naturally incorporates kernel functions from the passage level into the BERT‐based re‐ranking method, which provides a promising avenue for building universal retrieval‐then‐rerank information retrieval systems. Experiments conducted on two widely used TREC Robust04 and GOV2 test datasets show that the KnBERT has made significant improvements over other BERT‐based ranking approaches in terms of MAP, P@20, and NDCG@20 indicators with no extra or even less computations.
Funder
National Natural Science Foundation of China
Natural Sciences and Engineering Research Council of Canada
Reference67 articles.
1. DevlinJ ChangM‐W LeeK ToutanovaK.BERT: pre‐training of deep bidirectional transformers for language understanding.Proceedings of the 17th Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL‐HLT'19) 4171–4186.2019.http://arxiv.org/abs/1810.04805
2. LaskarMTR HoqueE HuangJX.Query focused abstractive summarization via incorporating query relevance and transfer learning with transformer models.Proceedings of the 33rd Canadian Conference on Artificial Intelligence (Canadian AI 2020) 342–348.2020b.
3. Exploring the limits of transfer learning with a unified text‐to‐text transformer;Raffel C;J Mach Learn Res,2020
4. SunY WangS LiY et al.ERNIE: enhanced representation through knowledge integration. ArXiv abs/1904.09223.2019.