Identifying Notable Tuples in Multi-Concept Web Tables
-
Published:2023-04
Issue:04
Volume:33
Page:575-602
-
ISSN:0218-1940
-
Container-title:International Journal of Software Engineering and Knowledge Engineering
-
language:en
-
Short-container-title:Int. J. Soft. Eng. Knowl. Eng.
Affiliation:
1. School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044, P. R. China
Abstract
Identifying notable tuples in a web table is of great help for table understanding and table summarization. However, existing document-internal feature-based methods are inappropriate for identifying notable tuples in web tables. Additionally, for the web table describing multiple concepts, the notability evaluation of a tuple needs to take into account multiple entities as well as their importance in this tuple. In this paper, we investigate the task of identifying notable tuples in a multi-concept web table and propose a framework that includes three tasks: (1) identify multiple entity columns and their importance weights by building a column correlation graph based on types and relationships in the table; (2) obtain fine-grained entity notability scores based on entity link graph and provide solution for entity link failure and entity domain neglection; and (3) evaluate tuple notability by a weighted sum of notability scores of all entities in the tuple. Comprehensive evaluation of our approach is based on real-world web tables. The results demonstrate that our approach outperforms the state-of-the-art baselines by 4.6% on the precision of detecting multiple entity columns and by 12.5% on the metric normalized discounted cumulative gain (NDCG) of evaluating entity notability.
Funder
National Key R & D Program of China
Publisher
World Scientific Pub Co Pte Ltd
Subject
Artificial Intelligence,Computer Graphics and Computer-Aided Design,Computer Networks and Communications,Software