Enhancing Answer Selection via Ad-Hoc Knowledge Extraction from Unstructured Web Texts
-
Published:2023-05-13
Issue:06
Volume:33
Page:933-951
-
ISSN:0218-1940
-
Container-title:International Journal of Software Engineering and Knowledge Engineering
-
language:en
-
Short-container-title:Int. J. Soft. Eng. Knowl. Eng.
Author:
Gu Shengwei12ORCID,
Luo Xiangfeng1,
Wang Hao1
Affiliation:
1. School of Computer Engineering and Science, Shanghai University, Shanghai 200444, P. R. China
2. School of Computer and Information Engineering, Chuzhou University, Chuzhou 239000, P. R. China
Abstract
Answer selection aims to identify the most relevant answers to a given question from a set of candidates. It is the fundamental component of intelligent question answering system. To improve performance, it gradually becomes an effective strategy to integrate external structured knowledge bases (KBs) into the answer selection model. Due to expensive cost of construction and maintenance of such KBs, these models are suffering from domain barriers and information incompleteness. In this paper, we propose a two-stage extraction–comprehension answer selection model, which can extract ad-hoc knowledge from unstructured web texts to enhance the performance of answer selection. For the extraction, two types of snippets are extracted from unstructured web pages and utilized as the source of ad-hoc knowledge. For the comprehension, a selective attention mechanism is employed to extract and integrate ad-hoc knowledge from multiple text snippets obtained in the first stage, which can enrich the representation of question–answer pairs and more accurately identify the correct answers. By incorporating ad-hoc knowledge extracted from both types of snippets, the proposed model achieves state-of-the-art results on two public available benchmark datasets. In particular, on WikiQA, in terms of the two evaluation metrics (mean average precision and mean reciprocal rank), it achieves 9.9[Formula: see text] and 8.4[Formula: see text] higher than the previous non-pretraining-based models, and 3.4[Formula: see text] and 3.2[Formula: see text] higher than the pretraining-based models.
Funder
Shanghai Outstanding Academic Leaders Plan
National Key Research and Development Program of China
National Natural Science Foundation of China
Shanghai Science and Technology Young Talents Sailing Program
Publisher
World Scientific Pub Co Pte Ltd
Subject
Artificial Intelligence,Computer Graphics and Computer-Aided Design,Computer Networks and Communications,Software