Modeling subpopulations for hierarchically structured data-Reference-Cited by-同舟云学术

Modeling subpopulations for hierarchically structured data

Published:2023-11-22 Issue: Volume: Page:
ISSN:1932-1864
Container-title:Statistical Analysis and Data Mining: The ASA Data Science Journal
language:en
Short-container-title:Statistical Analysis

Author:

Simpson Andrew¹,Michael Semhar¹^ORCID,Borchert Dylan¹,Saunders Christopher¹,Tang Larry²

Affiliation:

1. Mathematics and Statistics South Dakota State University Brookings South Dakota USA

2. Department of Statistics and Data Science and National Center for Forensic Science University of Central Florida Orlando Florida USA

Abstract

AbstractThe field of forensic statistics offers a unique hierarchical data structure in which a population is composed of several subpopulations of sources and a sample is collected from each source. This subpopulation structure creates an additional layer of complexity. Hence, the data has a hierarchical structure in addition to the existence of underlying subpopulations. Finite mixtures are known for modeling heterogeneity; however, previous parameter estimation procedures assume that the data is generated through a simple random sampling process. We propose using a semi‐supervised mixture modeling approach to model the subpopulation structure which leverages the fact that we know the collection of samples came from the same source, yet an unknown subpopulation. A simulation study and a real data analysis based on famous glass datasets and a keystroke dynamic typing data set show that the proposed approach performs better than other approaches that have been used previously in practice.

Funder

National Institute of Justice

National Science Foundation

Publisher

Wiley

Subject

Computer Science Applications,Information Systems,Analysis

Link

https://onlinelibrary.wiley.com/doi/am-pdf/10.1002/sam.11650

Reference27 articles.

1. Forensic Discrimination of Copper Wire Using Trace Element Concentrations

2. Evaluation of trace evidence in the form of multivariate data

3. A problem in forensic science

4. Speaker Verification Using Adapted Gaussian Mixture Models