Abstract
A model-based clustering method for compositional data is explored in this article. Most methods for compositional data analysis require some kind of transformation. The proposed method builds a mixture model using Dirichlet distribution which works with the unit sum constraint. The mixture model uses a hard EM algorithm with some modification to overcome the problem of fast convergence with empty clusters. This work includes a rigorous simulation study to evaluate the performance of the proposed method over varied dimensions, number of clusters, and overlap. The performance of the model is also compared with other popular clustering algorithms often used for compositional data analysis (e.g. KMeans, Gaussian mixture model (GMM) Gaussian Mixture Model with Hard EM (Hard GMM), partition around medoids (PAM), Clustering Large Applications based on Randomized Search (CLARANS), Density-Based Spatial Clustering of Applications with Noise (DBSCAN) etc.) for simulated data as well as two real data problems coming from the business and marketing domain and physical science domain, respectively. The study has shown promising results exploiting different distributional patterns of compositional data.
Publisher
Public Library of Science (PLoS)
Reference60 articles.
1. The statistical analysis of compositional data;J Aitchison;Journal of the Royal Statistical Society: Series B (Methodological),1982
2. Compositional data in neuroscience: If you’ve got it, log it!;PF Smith;Journal of neuroscience methods,2016
3. Compositional changes in a fumarolic field, Vulcano Island, Italy: a statistical case study;A Buccianti;Geological Society, London, Special Publications,2006
4. Log transformations in geochemistry;A Miesch;Journal of the International Association for Mathematical Geology,1977
5. Friedman J, Alm EJ. Inferring correlation networks from genomic survey data. 2012;.
Cited by
7 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献