Affiliation:
1. Dalhousie University, Canada
2. Dalhousie University, Halifax, Canada
3. Universidade de São Paulo, Brazil
Abstract
Document clustering is a necessary step in various analytical and automated activities. When guided by the user, algorithms are tailored to imprint a perspective on the clustering process that reflects the user’s understanding of the dataset. More than just allow for customized adjustment of the clusters, a visual analytics approach will provide tools for the user to draw new insights on the collection. While contributing his or her perspective, the user will also acquire a deeper understanding of the data set. To that effect, we propose a novel visual analytics system for interactive document clustering. We built our system on top of clustering algorithms that can adapt to user’s feedback. In the proposed system, initial clustering is created based on the user-defined number of clusters and the selected clustering algorithm. A set of coordinated visualizations allow the examination of the dataset and the results of the clustering. The visualization provides the user with the highlights of individual documents and understanding of the evolution of documents over the time period to which they relate. The users then interact with the process by means of changing key-terms that drive the process according to their knowledge of the documents domain. In key-term-based interaction, the user assigns a set of key-terms to each target cluster to guide the clustering algorithm. We have improved that process with a novel algorithm for choosing proper seeds for the clustering. Results demonstrate that not only the system has improved considerably its precision, but also its effectiveness in the document-based decision making. A set of quantitative experiments and a user study have been conducted to show the advantages of the approach for document analytics based on clustering. We performed and reported on the use of the framework in a real decision-making scenario that relates users discussion by email to decision making in improving patient care. Results show that the framework is useful even for more complex data sets such as email conversations.
Funder
Natural Sciences and Engineering Research Council of Canada
International Development Research Center, Ottawa, Canada
Boeing Compan
CNPq and FAPESP
Publisher
Association for Computing Machinery (ACM)
Subject
Artificial Intelligence,Human-Computer Interaction
Reference61 articles.
1. Accessed: 2017-10-07. Mind Map file format Description. http://freemind.sourceforge.net. Accessed: 2017-10-07. Mind Map file format Description. http://freemind.sourceforge.net.
2. Accessed: 2017-10-07. VNA file format Description. https://gephi.org/users/supported-graph-formats/netdraw-vna-format/. Accessed: 2017-10-07. VNA file format Description. https://gephi.org/users/supported-graph-formats/netdraw-vna-format/.
3. Incorporating domain knowledge into topic modeling via Dirichlet Forest priors
Cited by
10 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献