Merging two cultures: Deep and statistical learning-Reference-Cited by-同舟云学术

Merging two cultures: Deep and statistical learning

Published:2024-03 Issue:2 Volume:16 Page:
ISSN:1939-5108
Container-title:WIREs Computational Statistics
language:en
Short-container-title:WIREs Computational Stats

Author:

Bhadra Anindya¹,Datta Jyotishka²^ORCID,Polson Nicholas G.³,Sokolov Vadim⁴^ORCID,Xu Jianeng³

Affiliation:

1. Purdue University West Lafayette Indiana USA

2. Virginia Tech Blacksburg Virginia USA

3. University of Chicago Chicago Illinois USA

4. George Mason University Fairfax County Virginia USA

Abstract

AbstractOur goal is to provide a review of deep learning methods which provide insight into structured high‐dimensional data. Merging the two cultures of algorithmic and statistical learning sheds light on model construction and improved prediction and inference, leveraging the duality and trade‐off between the two. Prediction, interpolation, and uncertainty quantification can be achieved using probabilistic methods at the output layer of the model. Rather than using shallow additive architectures common to most statistical models, deep learning uses layers of semi‐affine input transformations to provide a predictive rule. Applying these layers of transformations leads to a set of attributes (or, features) to which probabilistic statistical methods can be applied. Thus, the best of both worlds can be achieved: scalable prediction rules fortified with uncertainty quantification where sparse regularization finds the features. We review the duality between shallow and wide models such as principal components regression, and partial least squares and deep but skinny architectures such as autoencoders, multilayer perceptrons, convolutional neural net, and recurrent neural net. The connection with data transformations is of practical importance for finding good network architectures. By incorporating probabilistic components at the output level, the predictive uncertainty is allowed. We illustrate this idea by comparing plain Gaussian processes (GP) with partial least squares + Gaussian process (PLS + GP) and deep learning + Gaussian process (DL + GP).This article is categorized under:

Statistical Learning and Exploratory Methods of the Data Sciences > Deep Learning

Publisher

Wiley

Link

https://wires.onlinelibrary.wiley.com/doi/pdf/10.1002/wics.1647

Reference79 articles.

1. Attentive state‐space modeling of disease progression;Alaa A. M.;Advances in Neural Information Processing Systems,2019

2. Bahdanau D. Cho K. &Bengio Y.(2014).Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473.

3. Binois M. &Wycoff N.(2021).A survey on high‐dimensional Gaussian process modeling with application to Bayesian optimization. arXiv preprint arXiv:2111.05040.

4. Neural Networks for Pattern Recognition