Affiliation:
1. School of Information Science and Engineering, Central South University, Changsha, Hunan 410083, P. R. China
Abstract
In the era of information overload, text summarization has become a focus of attention in a number of diverse fields such as, question answering systems, intelligence analysis, news recommendation systems, search results in web search engines, and so on. A good document representation is the key point in any successful summarizer. Learning this representation becomes a very active research in natural language processing field (NLP). Traditional approaches mostly fail to deliver a good representation. Word embedding has proved an excellent performance in learning the representation. In this paper, a modified BM25 with Word Embeddings are used to build the sentence vectors from word vectors. The entire document is represented as a set of sentence vectors. Then, the similarity between every pair of sentence vectors is computed. After that, TextRank, a graph-based model, is used to rank the sentences. The summary is generated by picking the top-ranked sentences according to the compression rate. Two well-known datasets, DUC2002 and DUC2004, are used to evaluate the models. The experimental results show that the proposed models perform comprehensively better compared to the state-of-the-art methods.
Publisher
World Scientific Pub Co Pte Lt
Subject
Artificial Intelligence,Computer Graphics and Computer-Aided Design,Computer Networks and Communications,Software
Cited by
6 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献