Fast deep mixtures of Gaussian process experts
-
Published:2024-01-08
Issue:3
Volume:113
Page:1483-1508
-
ISSN:0885-6125
-
Container-title:Machine Learning
-
language:en
-
Short-container-title:Mach Learn
Author:
Etienam Clement, Law Kody J. H., Wade Sara, Zankin VitalyORCID
Abstract
AbstractMixtures of experts have become an indispensable tool for flexible modelling in a supervised learning context, allowing not only the mean function but the entire density of the output to change with the inputs. Sparse Gaussian processes (GP) have shown promise as a leading candidate for the experts in such models, and in this article, we propose to design the gating network for selecting the experts from such mixtures of sparse GPs using a deep neural network (DNN). Furthermore, a fast one pass algorithm called Cluster–Classify–Regress (CCR) is leveraged to approximate the maximum a posteriori (MAP) estimator extremely quickly. This powerful combination of model and algorithm together delivers a novel method which is flexible, robust, and extremely efficient. In particular, the method is able to outperform competing methods in terms of accuracy and uncertainty quantification. The cost is competitive on low-dimensional and small data sets, but is significantly lower for higher-dimensional and big data sets. Iteratively maximizing the distribution of experts given allocations and allocations given experts does not provide significant improvement, which indicates that the algorithm achieves a good approximation to the local MAP estimator very fast. This insight can be useful also in the context of other mixture of experts models.
Funder
UT-Battelle Alan Turing Institute
Publisher
Springer Science and Business Media LLC
Reference71 articles.
1. Ambrogioni, L., Güçlü, U., van Gerven, M.A., & Maris, E. (2017). The kernel mixture network: A nonparametric method for conditional density estimation of continuous random variables. arXiv preprint arXiv:1705.07111 . 2. Banerjee, S., Gelfand, A. E., Finley, A. O., & Sang, H. (2008). Gaussian predictive process models for large spatial data sets. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 70(4), 825–848. 3. Bateman, G., Kritz, A. H., Kinsey, J. E., Redd, A. J., & Weiland, J. (1998). Predicting temperature and density profiles in tokamaks. Physics of Plasmas, 5(5), 1793–1799. 4. Bauer, M., van der Wilk, M., & Rasmussen, C. E. (2016). Understanding probabilistic sparse Gaussian process approximations. Advances in Neural Information Processing Systems, 29, 1533–1541. 5. Bernholdt, D. E., Cianciosa, M. R., Green, D. L., Park, J. M., Law, K. J., & Etienam, C. (2019). Cluster, classify, regress: A general method for learning discontinuous functions. Foundations of Data Science, 1, 491.
|
|