Affiliation:
1. Anna University, Chennai, India
2. Department of CSE, Anna University, Chennai, India
3. Department of IST, Anna University, Chennai, India
Abstract
Customization of information from web documents is an immense job that involves mainly the shortening of original texts. This task is carried out using summarization techniques. In general, an automatically generated summary is of two types – extractive and abstractive. Extractive methods use surface level and statistical features for the selection of important sentences, without considering the meaning conveyed by those sentences. In contrast, abstractive methods need a formal semantic representation, where the selection of important components and the rephrasing of the selected components are carried out using the semantic features associated with the words as well as the context. Furthermore, a deep linguistic analysis is needed for generating summaries. However, the bottleneck behind abstractive summarization is that it requires semantic representation, inference rules and natural language generation. In this paper, The authors propose a semi-supervised bootstrapping approach for the identification of important components for abstractive summarization. The input to the proposed approach is a fully connected semantic graph of a document, where the semantic graphs are constructed for sentences, which are then connected by synonym concepts and co-referring entities to form a complete semantic graph. The direction of the traversal of nodes is determined by a modified spreading activation algorithm, where the importance of the nodes and edges are decided, based on the node and its connected edges under consideration. Summary obtained using the proposed approach is compared with extractive and template based summaries, and also evaluated using ROUGE scores.
Subject
Computer Networks and Communications,Information Systems
Cited by
8 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献