Affiliation:
1. Florida International University
2. NEC Laboratories America
Abstract
Document understanding techniques such as document clustering and multidocument summarization have been receiving much attention recently. Current document clustering methods usually represent the given collection of documents as a document-term matrix and then conduct the clustering process. Although many of these clustering methods can group the documents effectively, it is still hard for people to capture the meaning of the documents since there is no satisfactory interpretation for each document cluster. A straightforward solution is to first cluster the documents and then summarize each document cluster using summarization methods. However, most of the current summarization methods are solely based on the sentence-term matrix and ignore the context dependence of the sentences. As a result, the generated summaries lack guidance from the document clusters. In this article, we propose a new language model to simultaneously cluster and summarize documents by making use of both the document-term and sentence-term matrices. By utilizing the mutual influence of document clustering and summarization, our method makes; (1) a better document clustering method with more meaningful interpretation; and (2) an effective document summarization method with guidance from document clustering. Experimental results on various document datasets show the effectiveness of our proposed method and the high interpretability of the generated summaries.
Funder
Division of Information and Intelligent Systems
Division of Computing and Communication Foundations
Division of Mathematical Sciences
Publisher
Association for Computing Machinery (ACM)
Reference57 articles.
1. Blei D. M. Ng A. Y. and Jordan M. I. 2002. Latent Dirichlet allocation. In Advances in Neural Information Processing Systems 14 T. G. Dietterich S. Becker and Z. Ghahramani Eds. MIT Press Cambridge MA 601--608. Blei D. M. Ng A. Y. and Jordan M. I. 2002. Latent Dirichlet allocation. In Advances in Neural Information Processing Systems 14 T. G. Dietterich S. Becker and Z. Ghahramani Eds. MIT Press Cambridge MA 601--608.
2. Text summarization via hidden Markov models
3. Co-clustering documents and words using bipartite spectral graph partitioning
Cited by
60 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献