Affiliation:
1. University of Trento, Italy
Abstract
Identifying and extracting valuable information from textual documents in the form of cohesively and appropriately developed summaries is one of the most challenging tasks in text mining and natural language processing. In this article, we present a sequential Markov model, equipped with Bayesian inference, to estimate the degree of importance of sentences in a document and thereby address the text summarisation problem. The proposed methodology models the extractive sentence summarisation as a Bayesian state estimation problem, where the system state is the importance degree of each sentence in a document. The transition and observation models are derived using a nonlinear dynamical system identification based on a recurrent feedback neural model that predicts the sentence observation using the sentence input data. In the end, the transition and observation probability density functions are modelled using a mixture density network. The performance assessment of the system has been carried out by investigating the optimal feature dimensionality and the impact of the model parameters on the system accuracy, using entropy-based risk and loss-based risk measures. Finally, the superiority of the proposed methodology over the state of the art in extractive summarisation is discussed and verified by reporting the recall, precision and accuracy on the real-world benchmark data sets.
Subject
Library and Information Sciences,Information Systems