Structure-Driven Representation Learning for Deep Clustering
-
Published:2023-10-16
Issue:1
Volume:18
Page:1-25
-
ISSN:1556-4681
-
Container-title:ACM Transactions on Knowledge Discovery from Data
-
language:en
-
Short-container-title:ACM Trans. Knowl. Discov. Data
Author:
Wang Xiang1ORCID,
Jing Liping1ORCID,
Liu Huafeng1ORCID,
Yu Jian1ORCID
Affiliation:
1. Beijing Key Lab of Traffic Data Analysis and Mining, Beijing Jiaotong University, China
Abstract
As an important branch of unsupervised learning methods, clustering makes a wide contribution in the area of data mining. It is well known that capturing the group-discriminative properties of each sample for clustering is crucial. Among them, deep clustering delivers promising results due to the strong representational power of neural networks. However, most of them adopt sample-level learning strategies, and the standalone data point barely captures its holistic cluster’s context and may undergo sub-optimal cluster assignment. To tackle this issue, we propose a Structure-driven Representation Learning (SRL) method by introducing latent structure information into the representation learning process at both the local and global levels. Specifically, a local-structure-driven sample representation strategy is proposed to approximate the estimation of data distribution, which models the neighborhood distribution of samples with potential structure information and exploits statistical dependencies between them to improve cluster consistency. A global-structure-driven cluster representation strategy is designed, where the context of each cluster is sufficiently encoded according to its samples (exemplar-theory) and corresponding prototype (prototype-theory). In this case, each cluster can only be related to its most similar samples, and different clusters are separated as much as possible. These two models are seamlessly combined into a joint optimization problem, which can be efficiently solved. Experiments on six widely-used datasets demonstrate the superiority of SRL over state-of-the-art clustering methods.
Funder
National Natural Science Foundation of China
National Key Research and Development Program
Joint Foundation of the Ministry of Education
Beijing Natural Science Foundation
Fundamental Research Funds for the Central Universities
Chinese Academy of Sciences
Publisher
Association for Computing Machinery (ACM)
Subject
General Computer Science
Reference63 articles.
1. Mahdi Abavisani, Alireza Naghizadeh, Dimitris N. Metaxas, and Vishal M. Patel. 2020. Deep subspace clustering with data augmentation. In NeurIPS.
2. Yuki Markus Asano Christian Rupprecht and Andrea Vedaldi. 2019. Self-labelling via simultaneous clustering and representation. In International Conference on Learning Representations .
3. Efficient Deep Embedded Subspace Clustering
4. Qi Cai Yu Wang Yingwei Pan Ting Yao and Tao Mei. 2020. Joint contrastive learning with infinite possibilities. Advances in Neural Information Processing Systems 33 (2020) 12638–12648.
5. Deep Clustering for Unsupervised Learning of Visual Features