Clustering Mixed-Type Data via Dirichlet Process Mixture Model with Cluster-Specific Covariance Matrices

Author:

Burhanuddin Nurul Afiqah12ORCID,Ibrahim Kamarulzaman1,Zulkafli Hani Syahida3,Mustapha Norwati4

Affiliation:

1. Department of Mathematical Sciences, Faculty of Science and Technology, Universiti Kebangsaan Malaysia, Bangi 43600, Selangor, Malaysia

2. Institute for Mathematical Research, Universiti Putra Malaysia, Serdang 43400, Selangor, Malaysia

3. Department of Mathematics and Statistics, Faculty of Science, Universiti Putra Malaysia, Serdang 43400, Selangor, Malaysia

4. Department of Computer Science, Faculty of Computer Science and Information Technology, Universiti Putra Malaysia, Serdang 43400, Selangor, Malaysia

Abstract

Many studies have shown successful applications of the Dirichlet process mixture model (DPMM) for clustering continuous data. Beyond continuous data, in practice, one can expect to see different data types, including ordinal and nominal data. Existing DPMMs for clustering mixed-type data assume a strict covariance matrix structure, resulting in an overfit model. This article explores a DPMM for mixed-type data that allows the covariance matrix to differ from one cluster to another. We assume an underlying latent variable framework for ordinal and nominal data, which is then modeled jointly with the continuous data. The identifiability issue on the covariance matrix poses computational challenges, thus requiring a nonstandard inferential algorithm. The applicability and flexibility of the proposed model are illustrated through simulation examples and real data applications.

Publisher

MDPI AG

Reference42 articles.

1. Model-based Gaussian and non-Gaussian clustering;Banfield;Biometrics,1993

2. Gaussian parsimonious clustering models;Celeux;Pattern Recognit.,1995

3. Clustering with label constrained Dirichlet process mixture model;Burhanuddin;Eng. Appl. Artif. Intell.,2022

4. Clustering mixed data;Hunt;Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery,2011

5. Latent class models for mixed variables with applications in Archaeometry;Moustaki;Comput. Stat. Data Anal.,2005

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3