Affiliation:
1. School of Computing and Artificial Intelligence, Southwest Jiaotong University; Engineering Research Center of Sustainable Urban Intelligent Transportation, Ministry of Educationn, China
2. School of Computing and Artificial Intelligence, Southwest Jiaotong University, China
Abstract
Co-clustering is to cluster samples and features simultaneously, which can also reveal the relationship between row clusters and column clusters. Therefore, lots of scientists have drawn much attention to conduct extensive research on it, and co-clustering is widely used in recommendation systems, gene analysis, medical data analysis, natural language processing, image analysis, and social network analysis. In this paper, we survey the entire research aspect of co-clustering, especially the latest advances in co-clustering, and discover the current research challenges and future directions. First, due to different views from researchers on the definition of co-clustering, this paper summarizes the definition of co-clustering and its extended definitions, as well as related issues, based on the perspectives of various scientists. Second, existing co-clustering techniques are approximately categorized into four classes: information-theory-based, graph-theory-based, matrix-factorization-based, and other theories-based. Third, co-clustering is applied in various aspects such as recommendation systems, medical data analysis, natural language processing, image analysis, and social network analysis. Furthermore, ten popular co-clustering algorithms are empirically studied on ten benchmark datasets with four metrics - accuracy, purity, block discriminant index, and running time; and their results are objectively reported. Finally, future work is provided to get insights into the research challenges of co-clustering.
Publisher
Association for Computing Machinery (ACM)
Reference160 articles.
1. Ensemble Block Co-clustering: A Unified Framework for Text Data
2. Regularized bi-directional co-clustering
3. Model-based co-clustering for the effective handling of sparse data
4. A generalized framework for Kullback–Leibler Markov aggregation;Amjad Rana Ali;IEEE Trans. Automat. Control,2019
5. Katy S Azoury and Manfred K Warmuth. 2001. Relative loss bounds for on-line density estimation with the exponential family of distributions. Machine learning 43 (2001), 211–246.