Microblog Retrieval Based on Concept-Enhanced Pre-Training Model

Author:

Wang Yashen1ORCID,Wang Zhaoyu2ORCID,Zhang Huanhuan3ORCID,Liu Zhirun4ORCID

Affiliation:

1. National Engineering Laboratory for Risk Perception and Prevention (RPP), China Academy of Electronics and Information Technology of CETC and Key Laboratory of Cognition and Intelligence Technology (CIT), Information Science Academy of CETC, Beijing, China

2. School of Information Sciences, University of Illinois at Urbana-Champaign, Urbana, IL

3. National Engineering Laboratory for Risk Perception and Prevention (RPP), China Academy of Electronics and Information Technology of CETC, Beijing, China

4. Beijing Engineering Research Center of High Volume Language Information Processing and Cloud Computing Applications, School of Computer, Beijing Institute of Technology, Beijing, China

Abstract

Despite substantial interest in applications of neural networks to information retrieval, neural ranking models have mostly been applied to conventional ad-hoc retrieval tasks over web pages and newswire articles. This article proposes a concept-enhanced pre-training model for microblog retrieval task, leveraging Semantic Matching Model (SMM) objective and Concept Correlation Model (CCM) objective. The proposed model is a novel neural ranking model specifically designed for ranking short-text microblog, which could merge the advantage of pre-training methodology for generating valid contextualized embedding with the superiority of the prior lexical knowledge (e.g., concept knowledge) for understanding short-text language semantic. We conduct experiments on widely used real-world datasets, and the experimental results demonstrate the efficiency of the proposed model, even compared with latest state-of-the-art neural-based models and pre-training based models.

Funder

National Natural Science Foundation of China

New Generation of Artificial Intelligence Special Action

National Integrated Big Data Center Pilot Project

Joint Advanced Research Foundation of China Electronics Technology Group Corporation

Publisher

Association for Computing Machinery (ACM)

Subject

General Computer Science

Reference83 articles.

1. Khaled Albishre, Mubarak Albathan, and Yuefeng Li. 2016. Effective 20 newsgroups dataset cleaning. In Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology. 98–101.

2. Khaled Albishre, Yuefeng Li, and Yue Xu. 2017. Effective pseudo-relevance for Microblog retrieval. In Proceedings of the Australasian Computer Science Week Multiconference. 51–56.

3. Mohannad Almasri, Catherine Berrut, and Jean Pierre Chevallet. 2016. A comparison of deep learning based query expansion with pseudo-relevance feedback and mutual information. In Proceedings of the European Conference on Information Retrieval. 709–715.

4. Ivan Bilan and Benjamin Roth. 2018. Position-aware self-attention with relative positional encodings for slot filling. arXiv:1807.03052. Retrieved from https://arxiv.org/abs/1807.03052.

5. Deeper text understanding for IR with contextual neural language modeling;Dai Zhuyun;Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval,2019

Cited by 4 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Hierarchical Text Classification of Chinese Public Security Cases Based on ERNIE 3.0 Model;2024 5th International Conference on Computer Vision, Image and Deep Learning (CVIDL);2024-04-19

2. Short-Text Conceptualization Based on Hyper-Graph Learning and Multiple Prior Knowledge;Communications in Computer and Information Science;2023-11-15

3. Leveraging Concept-Driven Pre-Training Model for Shot-Text Conceptualization Task;2023 8th International Conference on Data Science in Cyberspace (DSC);2023-08-18

4. A Novel Concept-Driven Negative Sampling Mechanism for Enhancing Semanticity and Interpretability of Knowledge Graph Completion Task;2023 8th International Conference on Data Science in Cyberspace (DSC);2023-08-18

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3