Structure-Driven Representation Learning for Deep Clustering-Reference-Cited by-同舟云学术

Structure-Driven Representation Learning for Deep Clustering

Published:2023-10-16 Issue:1 Volume:18 Page:1-25
ISSN:1556-4681
Container-title:ACM Transactions on Knowledge Discovery from Data
language:en
Short-container-title:ACM Trans. Knowl. Discov. Data

Author:

Wang Xiang¹^ORCID,Jing Liping¹^ORCID,Liu Huafeng¹^ORCID,Yu Jian¹^ORCID

Affiliation:

1. Beijing Key Lab of Traffic Data Analysis and Mining, Beijing Jiaotong University, China

Abstract

As an important branch of unsupervised learning methods, clustering makes a wide contribution in the area of data mining. It is well known that capturing the group-discriminative properties of each sample for clustering is crucial. Among them, deep clustering delivers promising results due to the strong representational power of neural networks. However, most of them adopt sample-level learning strategies, and the standalone data point barely captures its holistic cluster’s context and may undergo sub-optimal cluster assignment. To tackle this issue, we propose a Structure-driven Representation Learning (SRL) method by introducing latent structure information into the representation learning process at both the local and global levels. Specifically, a local-structure-driven sample representation strategy is proposed to approximate the estimation of data distribution, which models the neighborhood distribution of samples with potential structure information and exploits statistical dependencies between them to improve cluster consistency. A global-structure-driven cluster representation strategy is designed, where the context of each cluster is sufficiently encoded according to its samples (exemplar-theory) and corresponding prototype (prototype-theory). In this case, each cluster can only be related to its most similar samples, and different clusters are separated as much as possible. These two models are seamlessly combined into a joint optimization problem, which can be efficiently solved. Experiments on six widely-used datasets demonstrate the superiority of SRL over state-of-the-art clustering methods.

Funder

National Natural Science Foundation of China

National Key Research and Development Program

Joint Foundation of the Ministry of Education

Beijing Natural Science Foundation

Fundamental Research Funds for the Central Universities

Chinese Academy of Sciences

Publisher

Association for Computing Machinery (ACM)

Subject

General Computer Science

Link

https://dl.acm.org/doi/pdf/10.1145/3623400

Reference63 articles.

1. Mahdi Abavisani, Alireza Naghizadeh, Dimitris N. Metaxas, and Vishal M. Patel. 2020. Deep subspace clustering with data augmentation. In NeurIPS.

2. Yuki Markus Asano Christian Rupprecht and Andrea Vedaldi. 2019. Self-labelling via simultaneous clustering and representation. In International Conference on Learning Representations .

3. Efficient Deep Embedded Subspace Clustering

4. Qi Cai Yu Wang Yingwei Pan Ting Yao and Tao Mei. 2020. Joint contrastive learning with infinite possibilities. Advances in Neural Information Processing Systems 33 (2020) 12638–12648.

5. Deep Clustering for Unsupervised Learning of Visual Features