Affiliation:
1. Florida International University
Abstract
Over the last decade, document clustering, as one of the key tasks in information organization and navigation, has been widely studied. Many algorithms have been developed for addressing various challenges in document clustering and for improving clustering performance. However, relatively few research efforts have been reported on evaluating and understanding document clustering results. In this article, we present
DClusterE
, a comprehensive and effective framework for document clustering evaluation and understanding using information visualization.
DClusterE
integrates cluster validation with user interactions and offers rich visualization tools for users to examine document clustering results from multiple perspectives. In particular, through informative views including force-directed layout view, matrix view, and cluster view,
DClusterE
provides not only different aspects of document inter/intra-clustering structures, but also the corresponding relationship between clustering results and the ground truth. Additionally,
DClusterE
supports general user interactions such as zoom in/out, browsing, and interactive access of the documents at different levels. Two new techniques are proposed to implement
DClusterE
: (1) A novel multiplicative update algorithm (MUA) for matrix reordering to generate narrow-banded (or clustered) nonzero patterns from documents. Combined with coarse seriation, MUA is able to provide better visualization of the cluster structures. (2) A Mallows-distance-based algorithm for establishing the relationship between the clustering results and the ground truth, which serves as the basis for coloring schemes. Experiments and user studies are conducted to demonstrate the effectiveness and efficiency of
DClusterE
.
Funder
U.S. Department of Homeland Security
Division of Biological Infrastructure
Army Research Office
Division of Mathematical Sciences
Publisher
Association for Computing Machinery (ACM)
Subject
Artificial Intelligence,Theoretical Computer Science
Reference60 articles.
1. Visualization-enabled multi-document summarization by Iterative Residual Rescaling
2. Andrews N. O. and Fox E. A. 2007. Recent developments in document clustering. Tech. rep. TR-07-35 Department of Computer Science Virginia Tech. Andrews N. O. and Fox E. A. 2007. Recent developments in document clustering. Tech. rep. TR-07-35 Department of Computer Science Virginia Tech.
3. The bond energy algorithm revisited
4. Space-efficient approximate Voronoi diagrams
Cited by
11 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献