Exploiting neighborhood knowledge for single document summarization and keyphrase extraction
-
Published:2010-05
Issue:2
Volume:28
Page:1-34
-
ISSN:1046-8188
-
Container-title:ACM Transactions on Information Systems
-
language:en
-
Short-container-title:ACM Trans. Inf. Syst.
Author:
Wan Xiaojun1,
Xiao Jianguo1
Affiliation:
1. Peking University, Beijing, China
Abstract
Document summarization and keyphrase extraction are two related tasks in the IR and NLP fields, and both of them aim at extracting condensed representations from a single text document. Existing methods for single document summarization and keyphrase extraction usually make use of only the information contained in the specified document. This article proposes using a small number of nearest neighbor documents to improve document summarization and keyphrase extraction for the specified document, under the assumption that the neighbor documents could provide additional knowledge and more clues. The specified document is expanded to a small document set by adding a few neighbor documents close to the document, and the graph-based ranking algorithm is then applied on the expanded document set to make use of both the local information in the specified document and the global information in the neighbor documents. Experimental results on the Document Understanding Conference (DUC) benchmark datasets demonstrate the effectiveness and robustness of our proposed approaches. The cross-document sentence relationships in the expanded document set are validated to be beneficial to single document summarization, and the word cooccurrence relationships in the neighbor documents are validated to be very helpful to single document keyphrase extraction.
Funder
Ministry of Education of the People's Republic of China
Beijing Nova Program
Ministry of Science and Technology of the People's Republic of China
Program for New Century Excellent Talents in University
National Natural Science Foundation of China
Publisher
Association for Computing Machinery (ACM)
Subject
Computer Science Applications,General Business, Management and Accounting,Information Systems
Reference74 articles.
1. The use of unlabeled data to improve supervised learning for text summarization
2. Baeza-Yates R. and Ribeiro-Neto B. 1999. Modern Information Retrival. ACM Press/Addison Wesley. Baeza-Yates R. and Ribeiro-Neto B. 1999. Modern Information Retrival. ACM Press/Addison Wesley.
3. Fab
Cited by
134 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献