Everything, altogether, all at once: Addressing data challenges when measuring speech intelligibility through entropy scores-Reference-Cited by-同舟云学术

Everything, altogether, all at once: Addressing data challenges when measuring speech intelligibility through entropy scores

Published:2024-07-24 Issue:7 Volume:56 Page:8132-8154
ISSN:1554-3528
Container-title:Behavior Research Methods
language:en
Short-container-title:Behav Res

Author:

Espejo Jose Manuel Rivera^ORCID,De Maeyer Sven,Gillis Steven

Abstract

AbstractWhen investigating unobservable, complex traits, data collection and aggregation processes can introduce distinctive features to the data such as boundedness, measurement error, clustering, outliers, and heteroscedasticity. Failure to collectively address these features can result in statistical challenges that prevent the investigation of hypotheses regarding these traits. This study aimed to demonstrate the efficacy of the Bayesian beta-proportion generalized linear latent and mixed model (beta-proportion GLLAMM) (Rabe-Hesketh et al., Psychometrika, 69(2), 167–90, 2004a, Journal of Econometrics, 128(2), 301–23, 2004c, 2004b; Skrondal and Rabe-Hesketh 2004) in handling data features when exploring research hypotheses concerning speech intelligibility. To achieve this objective, the study reexamined data from transcriptions of spontaneous speech samples initially collected by Boonen et al. (Journal of Child Language, 50(1), 78–103, 2023). The data were aggregated into entropy scores. The research compared the prediction accuracy of the beta-proportion GLLAMM with the normal linear mixed model (LMM) (Holmes et al., 2019) and investigated its capacity to estimate a latent intelligibility from entropy scores. The study also illustrated how hypotheses concerning the impact of speaker-related factors on intelligibility can be explored with the proposed model. The beta-proportion GLLAMM was not free of challenges; its implementation required formulating assumptions about the data-generating process and knowledge of probabilistic programming languages, both central to Bayesian methods. Nevertheless, results indicated the superiority of the model in predicting empirical phenomena over the normal LMM, and its ability to quantify a latent potential intelligibility. Additionally, the proposed model facilitated the exploration of hypotheses concerning speaker-related factors and intelligibility. Ultimately, this research has implications for researchers and data analysts interested in quantitatively measuring intricate, unobservable constructs while accurately predicting the empirical phenomena.

Publisher

Springer Science and Business Media LLC

Link

https://link.springer.com/content/pdf/10.3758/s13428-024-02457-6.pdf

Reference87 articles.

1. Baker, F. (1998). An Investigation of the Item Parameter Recovery Characteristics of a Gibbs Sampling Procedure. Applied Psychological Measurement, 22(22), 153–169. https://doi.org/10.1177/01466216980222005

2. Baldwin, S., & Fellingham, G. (2013). Bayesian Methods for the Analysis of Small Sample Multilevel Data with a Complex Variance Structure. Journal of Psychological Methods, 18(2), 151–164. https://doi.org/10.1037/a0030642

3. Bayes, C., Bazán, J., & García, C. (2012). A New Robust Regression Model for Proportions. Bayesian Analysis, 7(4), 841–866. https://doi.org/10.1214/12-ba728

4. Boonen, N., Kloots, H., & Gillis, S. (2020). Rating the Overall Speech Quality of Hearing-Impaired Children by Means of Comparative Judgements. Journal of Communication Disorders, 83, 1675–1687. https://doi.org/10.1016/j.jcomdis.2019.105969

5. Boonen, N., Kloots, H., Nurzia, P., & Gillis, S. (2023). Spontaneous Speech Intelligibility: Early Cochlear Implanted Children Versus Their Normally Hearing Peers at Seven Years of Age. Journal of Child Language, 50(1), 78–103. https://doi.org/10.1017/S0305000921000714