Gradient-based training of Gaussian Mixture Models for High-Dimensional Streaming Data-Reference-Cited by-同舟云学术

Gradient-based training of Gaussian Mixture Models for High-Dimensional Streaming Data

Published:2021-07-26 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Gepperth Alexander¹,Pfülb Benedikt¹^ORCID

Affiliation:

1. Fulda University of Applied Sciences: Hochschule Fulda

Abstract

Abstract We present an approach for efficiently training Gaussian Mixture Model (GMM) by Stochastic Gradient Descent (SGD) with non-stationary, high-dimensional streaming data. Our training scheme does not require data-driven parameter initialization (e.g., k-means) and can thus be trained based on a random initialization. Furthermore, the approach allows mini-batch sizes as low as 1, which are typical for streaming-data settings. Major problems in such settings are undesirable local optima during early training phases and numerical instabilities due to high data dimensionalities. We introduce an adaptive annealing procedure to address the first problem, whereas numerical instabilities are eliminated by using an exponential-free approximation to the standard GMM log-likelihood. Experiments on a variety of visual and non-visual benchmarks show that our SGD approach can be trained completely without, for instance, k-means based centroid initialization. It also compares favorably to an online variant of Expectation-Maximization (EM) – stochastic EM (sEM), which it outperforms by a large margin for very high-dimensional data.

Publisher

Research Square Platform LLC

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Generalizing Self-organizing Maps: Large-Scale Training of GMMs and Applications in Data Science;Lecture Notes in Networks and Systems;2024

2. Probabilistic Models with Invariance;Lecture Notes in Networks and Systems;2024