Affiliation:
1. Dalhousie University, NS, Canada
2. Universidade de São Paulo, Brazil
Abstract
The keyterm-based approach is arguably intuitive for users to direct text-clustering processes and adapt results to various applications in text analysis. Its way of markedly influencing the results, for instance, by expressing important terms in relevance order, requires little knowledge of the algorithm and has predictable effect, speeding up the task. This article first presents a text-clustering algorithm that can easily be extended into an interactive algorithm. We evaluate its performance against state-of-the-art clustering algorithms in unsupervised mode. Next, we propose three interactive versions of the algorithm based on keyterm labeling, document labeling, and hybrid labeling. We then demonstrate that keyterm labeling is more effective than document labeling in text clustering. Finally, we propose a visual approach to support the keyterm-based version of the algorithm. Visualizations are provided for the whole collection as well as for detailed views of document and cluster relationships. We show the effectiveness and flexibility of our framework,
Vis-Kt
, by presenting typical clustering cases on real text document collections. A user study is also reported that reveals overwhelmingly positive acceptance toward keyterm-based clustering.
Funder
Boeing Company, CNPq and FAPESP
Natural Sciences and Engineering Research Council of Canada
International Development Research Centre, Ottawa, Canada
Publisher
Association for Computing Machinery (ACM)
Subject
Artificial Intelligence,Human-Computer Interaction
Reference53 articles.
1. Incorporating domain knowledge into topic modeling via Dirichlet Forest priors
2. Local algorithms for interactive clustering;Awasthi P.;J. Mach. Learn. Res.,2017
Cited by
11 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献