iVIBRATE-Reference-Cited by-同舟云学术

iVIBRATE

Published:2006-04 Issue:2 Volume:24 Page:245-294
ISSN:1046-8188
Container-title:ACM Transactions on Information Systems
language:en
Short-container-title:ACM Trans. Inf. Syst.

Author:

Chen Keke¹,Liu Ling¹

Affiliation:

1. Georgia Institute of Technology, Atlanta, GA

Abstract

With continued advances in communication network technology and sensing technology, there is astounding growth in the amount of data produced and made available through cyberspace. Efficient and high-quality clustering of large datasets continues to be one of the most important problems in large-scale data analysis. A commonly used methodology for cluster analysis on large datasets is the three-phase framework of sampling/summarization, iterative cluster analysis, and disk-labeling. There are three known problems with this framework which demand effective solutions. The first problem is how to effectively define and validate irregularly shaped clusters, especially in large datasets. Automated algorithms and statistical methods are typically not effective in handling these particular clusters. The second problem is how to effectively label the entire data on disk (disk-labeling) without introducing additional errors, including the solutions for dealing with outliers, irregular clusters, and cluster boundary extension. The third obstacle is the lack of research about issues related to effectively integrating the three phases. In this article, we describe iVIBRATE---an interactive visualization-based three-phase framework for clustering large datasets. The two main components of iVIBRATE are its VISTA visual cluster-rendering subsystem which invites human interplay into the large-scale iterative clustering process through interactive visualization, and its adaptive ClusterMap labeling subsystem which offers visualization-guided disk-labeling solutions that are effective in dealing with outliers, irregular clusters, and cluster boundary extension. Another important contribution of iVIBRATE development is the identification of the special issues presented in integrating the two components and the sampling approach into a coherent framework, as well as the solutions for improving the reliability of the framework and for minimizing the amount of errors generated within the cluster analysis process. We study the effectiveness of the iVIBRATE framework through a walkthrough example dataset of a million records and we experimentally evaluate the iVIBRATE approach using both real-life and synthetic datasets. Our results show that iVIBRATE can efficiently involve the user in the clustering process and generate high-quality clustering results for large datasets.

Publisher

Association for Computing Machinery (ACM)

Subject

Computer Science Applications,General Business, Management and Accounting,Information Systems

Link

https://dl.acm.org/doi/pdf/10.1145/1148020.1148024

Reference58 articles.

1. The Grand Tour: A Tool for Viewing Multidimensional Data

2. Baeza-Yates R. and Ribeiro-Neto B. 1999. Modern Information Retrieval. Addison Wesley New York.]] Baeza-Yates R. and Ribeiro-Neto B. 1999. Modern Information Retrieval. Addison Wesley New York.]]

3. The New Jersey data reduction report;Barbará D.;IEEE Data Eng. Bull.,1997

Cited by 33 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Interactive Trajectory Star Coordinates i-tStar and Its Extension i-tStar (3D);Computer Modeling in Engineering & Sciences;2023

2. Interactive Visual Cluster Analysis by Contrastive Dimensionality Reduction;IEEE Transactions on Visualization and Computer Graphics;2022

3. Interactive Clustering;ACM Computing Surveys;2021-01-31

4. Interactive visual analytics tool for multidimensional quantitative and categorical data analysis;Information Visualization;2020-05-25

5. Interactive Assigning of Conference Sessions with Visualization and Topic Modeling;IEEE PAC VIS SYMP;2020