Utilizing passage‐level relevance and kernel pooling for enhancing BERT‐based document reranking

Author:

Pan Min1,Zhou Shuting1,Li Teng1ORCID,Liu Yu1,Pei Quanli2,Huang Angela J.3,Huang Jimmy X.2ORCID

Affiliation:

1. College of Computer and Information Engineering Hubei Normal University Huangshi China

2. School of Information Technology York University Toronto Canada

3. Lassonde School of Engineering York University Toronto Ontario Canada

Abstract

AbstractThe pre‐trained language model (PLM) based on the Transformer encoder, namely BERT, has achieved state‐of‐the‐art results in the field of Information Retrieval. Existing BERT‐based ranking models divide documents into passages and aggregate passage‐level relevance to rank the document list. However, these common score aggregation strategies cannot capture important semantic information such as document structure and have not been extensively studied. In this article, we propose a novel kernel‐based score pooling system to capture document‐level relevance by aggregating passage‐level relevance. In particular, we propose and study several representative kernel pooling functions and several different document ranking strategies based on passage‐level relevance. Our proposed framework KnBERT naturally incorporates kernel functions from the passage level into the BERT‐based re‐ranking method, which provides a promising avenue for building universal retrieval‐then‐rerank information retrieval systems. Experiments conducted on two widely used TREC Robust04 and GOV2 test datasets show that the KnBERT has made significant improvements over other BERT‐based ranking approaches in terms of MAP, P@20, and NDCG@20 indicators with no extra or even less computations.

Funder

National Natural Science Foundation of China

Natural Sciences and Engineering Research Council of Canada

Publisher

Wiley

Reference67 articles.

1. DevlinJ ChangM‐W LeeK ToutanovaK.BERT: pre‐training of deep bidirectional transformers for language understanding.Proceedings of the 17th Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL‐HLT'19) 4171–4186.2019.http://arxiv.org/abs/1810.04805

2. LaskarMTR HoqueE HuangJX.Query focused abstractive summarization via incorporating query relevance and transfer learning with transformer models.Proceedings of the 33rd Canadian Conference on Artificial Intelligence (Canadian AI 2020) 342–348.2020b.

3. Exploring the limits of transfer learning with a unified text‐to‐text transformer;Raffel C;J Mach Learn Res,2020

4. SunY WangS LiY et al.ERNIE: enhanced representation through knowledge integration. ArXiv abs/1904.09223.2019.

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3