Similarity versus relatedness: A novel approach in extractive Persian document summarisation-Reference-Cited by-同舟云学术

Similarity versus relatedness: A novel approach in extractive Persian document summarisation

Published:2017-02-01 Issue:3 Volume:44 Page:314-330
ISSN:0165-5515
Container-title:Journal of Information Science
language:en
Short-container-title:Journal of Information Science

Author:

Shafiee Fatemeh¹,Shamsfard Mehrnoush¹

Affiliation:

1. Natural Language Processing (NLP) Research Lab, Faculty of Computer Science and Engineering, Shahid Beheshti University, Iran

Abstract

Automatic text summarisation is the process of creating a summary from one or more documents by eliminating the details and preserving the worthwhile information. This article presents a single/multi-document summariser using a novel clustering method for creating summaries. First, a feature selection phase is employed. Then, FarsNet, the Persian WordNet, is utilised to extract the semantic information of words. Therefore, the input sentences are categorised into three main clusters: similarity, relatedness and coherency. Each similarity cluster contains similar sentences to its core, while each relatedness cluster contains sentences that are related (but not similar) to its core. The coherency clusters show the sentences that should be kept together to preserve the coherency of the summary. Finally, the centroid of each similarity cluster having the most feature score is added to an empty summary. The summary is enlarged by including related sentences from relatedness clusters and excluding similar sentences to its content iteratively. Coherency clusters are applied to the created summary in the last step. The proposed method has been compared with three known existing text summarisation systems and techniques for the Persian language: FarsiSum, Parsumist and Ijaz. Our proposed method leads to improvement in experimental results on different measurements including precision, recall, F-measure, ROUGE-N and ROUGE-L.

Publisher

SAGE Publications

Subject

Library and Information Sciences,Information Systems

Link

http://journals.sagepub.com/doi/pdf/10.1177/0165551517693537

Reference40 articles.

1. Summarization of clinical information: A conceptual model

2. Automatic Text Summarization: Past, Present and Future

3. A Survey of Text Summarization Techniques

Cited by 7 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Features in extractive supervised single-document summarization: case of Persian news;Language Resources and Evaluation;2024-05-08

2. Studying the cognitive relatedness between topics in the global science landscape: The case of Big Data research;Journal of Information Science;2022-09-18

3. Analysis of Various Machine Learning Techniques used for Automatic Text Summarization;2022 Fifth International Conference on Computational Intelligence and Communication Technologies (CCICT);2022-07

4. Multi-level text document similarity estimation and its application for plagiarism detection;Iran Journal of Computer Science;2022-02-08

5. Unsupervised extractive multi-document summarization method based on transfer learning from BERT multi-task fine-tuning;Journal of Information Science;2021-02-15