Local and Overall Deviance R-Squared Measures for Mixtures of Generalized Linear Models-Reference-Cited by-同舟云学术

Local and Overall Deviance R-Squared Measures for Mixtures of Generalized Linear Models

Published:2023-04-04 Issue:2 Volume:40 Page:233-266
ISSN:0176-4268
Container-title:Journal of Classification
language:en
Short-container-title:J Classif

Author:

Di Mari Roberto^ORCID,Ingrassia Salvatore,Punzo Antonio

Abstract

AbstractIn generalized linear models (GLMs), measures of lack of fit are typically defined as the deviance between two nested models, and a deviance-based R2 is commonly used to evaluate the fit. In this paper, we extend deviance measures to mixtures of GLMs, whose parameters are estimated by maximum likelihood (ML) via the EM algorithm. Such measures are defined both locally, i.e., at cluster-level, and globally, i.e., with reference to the whole sample. At the cluster-level, we propose a normalized two-term decomposition of the local deviance into explained, and unexplained local deviances. At the sample-level, we introduce an additive normalized decomposition of the total deviance into three terms, where each evaluates a different aspect of the fitted model: (1) the cluster separation on the dependent variable, (2) the proportion of the total deviance explained by the fitted model, and (3) the proportion of the total deviance which remains unexplained. We use both local and global decompositions to define, respectively, local and overall deviance R2 measures for mixtures of GLMs, which we illustrate—for Gaussian, Poisson and binomial responses—by means of a simulation study. The proposed fit measures are then used to assess, and interpret clusters of COVID-19 spread in Italy in two time points.

Funder

Università degli Studi di Catania

Publisher

Springer Science and Business Media LLC

Subject

Library and Information Sciences,Statistics, Probability and Uncertainty,Psychology (miscellaneous),Mathematics (miscellaneous)

Link

https://link.springer.com/content/pdf/10.1007/s00357-023-09432-4.pdf

Reference35 articles.

1. Biernacki, C., Celeux, G., & Govaert, G. (2003). Choosing starting values for the EM algorithm for getting the highest likelihood in multivariate Gaussian mixture models. Computational Statistics & Data Analysis, 41(3-4), 561–575.

2. Cameron, A. C., & Windmeijer, F. A. G. (1996). R-squared measures for count data regression models with applications to health-care utilization. Journal of Business & Economic Statistics, 14(2), 209–220.

3. Cameron, A. C., & Windmeijer, F. A. G. (1997). An R-squared measure of goodness of fit for some common nonlinear regression models. Journal of Econometrics, 77(2), 329–342.

4. Celeux, G., & Govaert, G. (1992). A classification EM algorithm for clustering and two stochastic versions. Computational Statistics & Data Analysis, 14(3), 315–332.

5. Cerdeira, J. O., Martins, M. J., & Silva, P. C. (2012). A combinatorial approach to assess the separability of clusters. Journal of Classification, 29(1), 7–22.

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Cluster-weighted modeling with measurement error in covariates;Communications in Statistics - Theory and Methods;2024-02-09

2. An Investigation of LES Wall Modeling for Rayleigh-Bénard Convection via Interpretable and Physics-Aware Feedforward Neural Networks with DNS;Journal of the Atmospheric Sciences;2023-12-21