Affiliation:
1. University of Milano–Bicocca Department of Economics, Management and Statistics, , Piazza dell’Ateneo Nuovo 1, 20126 Milano, Italy
2. Duke University Department of Statistical Science, , Box 90251, Durham, North Carolina 27708, U.S.A
Abstract
Summary
Loss-based clustering methods, such as k-means clustering and its variants, are standard tools for finding groups in data. However, the lack of quantification of uncertainty in the estimated clusters is a disadvantage. Model-based clustering based on mixture models provides an alternative approach, but such methods face computational problems and are highly sensitive to the choice of kernel. In this article we propose a generalized Bayes framework that bridges between these paradigms through the use of Gibbs posteriors. In conducting Bayesian updating, the loglikelihood is replaced by a loss function for clustering, leading to a rich family of clustering methods. The Gibbs posterior represents a coherent updating of Bayesian beliefs without needing to specify a likelihood for the data, and can be used for characterizing uncertainty in clustering. We consider losses based on Bregman divergence and pairwise similarities, and develop efficient deterministic algorithms for point estimation along with sampling algorithms for uncertainty quantification. Several existing clustering algorithms, including k-means, can be interpreted as generalized Bayes estimators in our framework, and thus we provide a method of uncertainty quantification for these approaches, allowing, for example, calculation of the probability that a data point is well clustered.
Publisher
Oxford University Press (OUP)
Subject
Applied Mathematics,Statistics, Probability and Uncertainty,General Agricultural and Biological Sciences,Agricultural and Biological Sciences (miscellaneous),General Mathematics,Statistics and Probability
Reference44 articles.
1. Categorical Data Analysis
2. Clustering with Bregman divergences;Banerjee,;J. Mach. Learn. Res.,2005
3. Probabilistic D-clustering;Ben-israel,;J. Classif.,2008
4. A general framework for updating belief distributions;Bissiri,;J. R. Statist. Soc. B,2016
5. Probabilistic models in cluster analysis;Bock,;Comp. Statist. Data Anal.,1996
Cited by
5 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献