Identifying High Quality Document–Summary Pairs through Text Matching-Reference-Cited by-同舟云学术

Identifying High Quality Document–Summary Pairs through Text Matching

Published:2017-06-12 Issue:2 Volume:8 Page:64
ISSN:2078-2489
Container-title:Information
language:en
Short-container-title:Information

Author:

Hou Yongshuai,Xiang Yang,Tang Buzhou,Chen Qingcai,Wang Xiaolong,Zhu Fangze

Abstract

Text summarization namely, automatically generating a short summary of a given document, is a difficult task in natural language processing. Nowadays, deep learning as a new technique has gradually been deployed for text summarization, but there is still a lack of large-scale high quality datasets for this technique. In this paper, we proposed a novel deep learning method to identify high quality document–summary pairs for building a large-scale pairs dataset. Concretely, a long short-term memory (LSTM)-based model was designed to measure the quality of document–summary pairs. In order to leverage information across all parts of each document, we further proposed an improved LSTM-based model by removing the forget gate in the LSTM unit. Experiments conducted on the training set and the test set built upon Sina Weibo (a Chinese microblog website similar to Twitter) showed that the LSTM-based models significantly outperformed baseline models with regard to the area under receiver operating characteristic curve (AUC) value.

Publisher

MDPI AG

Subject

Information Systems

Link

https://www.mdpi.com/2078-2489/8/2/64/pdf

Reference57 articles.

1. Recent automatic text summarization techniques: a survey

2. Trends in Extractive and Abstractive Techniques in Text Summarization

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Long Text QA Matching Model Based on BiGRU–DAttention–DSSM;Mathematics;2021-05-17

2. Global Encoding for Long Chinese Text Summarization;ACM Transactions on Asian and Low-Resource Language Information Processing;2020-11-30

3. A Framework for Word Embedding Based Automatic Text Summarization and Evaluation;Information;2020-01-31