Affiliation:
1. Department of Mathematical Sciences, Durham University, Durham DH1 3LE, UK
2. Durham Research Methods Centre, Durham University, Durham DH1 3LE, UK
Abstract
We consider situations in which the clustering of some multivariate data is desired, which establishes an ordering of the clusters with respect to an underlying latent variable. As our motivating example for a situation where such a technique is desirable, we consider scatterplots of traffic flow and speed, where a pattern of consecutive clusters can be thought to be linked by a latent variable, which is interpretable as traffic density. We focus on latent structures of linear or quadratic shapes, and present an estimation methodology based on expectation–maximization, which estimates both the latent subspace and the clusters along it. The directed clustering approach is summarized in two algorithms and applied to the traffic example outlined. Connections to related methodology, including principal curves, are briefly drawn.
Reference36 articles.
1. MacQueen, J.B. (1967). Some Methods for Classification and Analysis of Multivariate Observations. Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability. Volume 1: Statistics, University of California Press.
2. K-means clustering algorithms: A comprehensive review, variants analysis, and advances in the era of big data;Ikotun;Inf. Sci.,2023
3. Model-based clustering, discriminant analysis, and density estimation;Fraley;J. Am. Stat. Assoc.,2002
4. Model-based clustering;McNicholas;J. Classif.,2016
5. Mean shift, mode seeking, and clustering;Cheng;IEEE Trans. Pattern Anal. Mach. Intell.,1995