Abstract
AbstractFacilitated by the powerful feature extraction ability of neural networks, deep clustering has achieved great success in analyzing high-dimensional and complex real-world data. The performance of deep clustering methods is affected by various factors such as network structures and learning objectives. However, as pointed out in this survey, the essence of deep clustering lies in the incorporation and utilization of prior knowledge, which is largely ignored by existing works. From pioneering deep clustering methods based on data structure assumptions to recent contrastive clustering methods based on data augmentation invariances, the development of deep clustering intrinsically corresponds to the evolution of prior knowledge. In this survey, we provide a comprehensive review of deep clustering methods by categorizing them into six types of prior knowledge. We find that in general the prior innovation follows two trends, namely, i) from mining to constructing, and ii) from internal to external. Besides, we provide a benchmark on five widely-used datasets and analyze the performance of methods with diverse priors. By providing a novel prior knowledge perspective, we hope this survey could provide some novel insights and inspire future research in the deep clustering community.
Funder
National Natural Science Foundation of China
Fundamental Research Funds for the Central Universities
Publisher
Springer Science and Business Media LLC
Reference123 articles.
1. E. Amigó, J. Gonzalo, J. Artiles et al., A comparison of extrinsic clustering evaluation metrics based on formal constraints. Inf. Retr. 12, 461–486 (2009)
2. G. Andrew, R. Arora, J. Bilmes et al., Deep canonical correlation analysis. In Proceedings of the 30th International Conference on Machine Learning. PMLR, vol 28 (Atlanta, GA, USA, 17-19 June 2013), pp. 1247–1255
3. C.E. Antoniak, Mixtures of dirichlet processes with applications to bayesian nonparametric problems. Ann. Stat. 2(6), 1152–1174 (1974)
4. M. Belkin, P. Niyogi, Laplacian eigenmaps and spectral techniques for embedding and clustering. Adv. Neural Inf. Process. Syst. 14, 585–591 (2001)
5. Y. Bengio, A. Courville, P. Vincent, Representation learning: A review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 35(8), 1798–1828 (2013)