Distribution Optimization: An evolutionary algorithm to separate Gaussian mixtures-Reference-Cited by-同舟云学术

Distribution Optimization: An evolutionary algorithm to separate Gaussian mixtures

Published:2020-01-20 Issue:1 Volume:10 Page:
ISSN:2045-2322
Container-title:Scientific Reports
language:en
Short-container-title:Sci Rep

Author:

Lerch Florian,Ultsch Alfred,Lötsch Jörn^ORCID

Abstract

AbstractFinding subgroups in biomedical data is a key task in biomedical research and precision medicine. Already one-dimensional data, such as many different readouts from cell experiments, preclinical or human laboratory experiments or clinical signs, often reveal a more complex distribution than a single mode. Gaussian mixtures play an important role in the multimodal distribution of one-dimensional data. However, although fitting of Gaussian mixture models (GMM) is often aimed at obtaining the separate modes composing the mixture, current technical implementations, often using the Expectation Maximization (EM) algorithm, are not optimized for this task. This occasionally results in poorly separated modes that are unsuitable for determining a distinguishable group structure in the data. Here, we introduce “Distribution Optimization” an evolutionary algorithm to GMM fitting that uses an adjustable error function that is based on chi-square statistics and the probability density. The algorithm can be directly targeted at the separation of the modes of the mixture by employing additional criterion for the degree by which single modes overlap. The obtained GMM fits were comparable with those obtained with classical EM based fits, except for data sets where the EM algorithm produced unsatisfactory results with overlapping Gaussian modes. There, the proposed algorithm successfully separated the modes, providing a basis for meaningful group separation while fitting the data satisfactorily. Through its optimization toward mode separation, the evolutionary algorithm proofed particularly suitable basis for group separation in multimodally distributed data, outperforming alternative EM based methods.

Publisher

Springer Science and Business Media LLC

Subject

Multidisciplinary

Link

http://www.nature.com/articles/s41598-020-57432-w.pdf

Reference24 articles.

1. Ameijeiras-Alonso, J., Crujeiras, R. M. & Rodríguez-Casal, A. Mode testing, critical bandwidth and excess mass. ArXiv e-prints (2016).

2. Dempster, A. P., Laird, N. M. & Rubin, D. B. Maximum Likelihood from Incomplete Data via the EM Algorithm. Journal of the Royal Statistical Society. Series B 39, 1–38 (1977).

3. Bishop, C. Pattern recognition and machine learning. (Springer, 2006).

4. Frühwirth-Schnatter, S. Finite Mixture and Markov Switching Models. (Springer New York, 2006).

5. Kim, D. K. & Jeremy, M. G. T. The Restricted EM Algorithm for Maximum Likelihood Estimation Under Linear Restrictions on the Parameters. Journal of the American Statistical Association 90, 708–716 (1995).

Cited by 16 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. A Novel Computational Instrument Based on a Universal Mixture Density Network with a Gaussian Mixture Model as a Backbone for Predicting COVID-19 Variants’ Distributions;Mathematics;2024-04-20

2. A versatile and upgraded version of the LundTax classification algorithm applied to independent cohorts;2023-12-15

3. Modeling the time-dependent transmission rate using gaussian pulses for analyzing the COVID-19 outbreaks in the world;Scientific Reports;2023-03-18

4. A New Polymorphic Comprehensive Model for COVID-19 Transition Cycle Dynamics with Extended Feed Streams to Symptomatic and Asymptomatic Infections;Mathematics;2023-02-23

5. Evaluating the impact of a time-evolving constellation on multi-platform satellite based daily precipitation estimates;Atmospheric Research;2022-12