Extractive text summarisation using Bayesian state estimation of sentences: A Markovian framework

Author:

Ghanbari Haez Saba1,Shamsfakhr Farhad1ORCID

Affiliation:

1. University of Trento, Italy

Abstract

Identifying and extracting valuable information from textual documents in the form of cohesively and appropriately developed summaries is one of the most challenging tasks in text mining and natural language processing. In this article, we present a sequential Markov model, equipped with Bayesian inference, to estimate the degree of importance of sentences in a document and thereby address the text summarisation problem. The proposed methodology models the extractive sentence summarisation as a Bayesian state estimation problem, where the system state is the importance degree of each sentence in a document. The transition and observation models are derived using a nonlinear dynamical system identification based on a recurrent feedback neural model that predicts the sentence observation using the sentence input data. In the end, the transition and observation probability density functions are modelled using a mixture density network. The performance assessment of the system has been carried out by investigating the optimal feature dimensionality and the impact of the model parameters on the system accuracy, using entropy-based risk and loss-based risk measures. Finally, the superiority of the proposed methodology over the state of the art in extractive summarisation is discussed and verified by reporting the recall, precision and accuracy on the real-world benchmark data sets.

Publisher

SAGE Publications

Subject

Library and Information Sciences,Information Systems

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3