Abstract
AbstractExtractive document summary is usually seen as a sequence labeling task, which the summary is formulated by sentences from the original document. However, the selected sentences usually are high redundancy in semantic space, so that the composed summary are high semantic redundancy. To alleviate this problem, we propose a model to reduce the semantic redundancy of summary by introducing the cluster algorithm to select difference sentences in semantic space and we improve the base BERT to score sentences. We evaluate our model and perform significance testing using ROUGE on the CNN/DailyMail datasets compare with six baselines, which include two traditional methods and four state-of-art deep learning model. The results validate the effectiveness of our approach, which leverages K-means algorithm to produce more accurate and less repeat sentences in semantic summaries.
Funder
Research on the Evidence Chain Construction from the Analvsis of the investigation Documents
Research on spatial Optimization and Allocation of DistributedScientific and Technological Resources
Research on key technologies for intelligent diagnosis of reservoir and dam health driven by both knowledge and data
Publisher
Springer Science and Business Media LLC