Affiliation:
1. Purdue University West Lafayette Indiana USA
2. Virginia Tech Blacksburg Virginia USA
3. University of Chicago Chicago Illinois USA
4. George Mason University Fairfax County Virginia USA
Abstract
AbstractOur goal is to provide a review of deep learning methods which provide insight into structured high‐dimensional data. Merging the two cultures of algorithmic and statistical learning sheds light on model construction and improved prediction and inference, leveraging the duality and trade‐off between the two. Prediction, interpolation, and uncertainty quantification can be achieved using probabilistic methods at the output layer of the model. Rather than using shallow additive architectures common to most statistical models, deep learning uses layers of semi‐affine input transformations to provide a predictive rule. Applying these layers of transformations leads to a set of attributes (or, features) to which probabilistic statistical methods can be applied. Thus, the best of both worlds can be achieved: scalable prediction rules fortified with uncertainty quantification where sparse regularization finds the features. We review the duality between shallow and wide models such as principal components regression, and partial least squares and deep but skinny architectures such as autoencoders, multilayer perceptrons, convolutional neural net, and recurrent neural net. The connection with data transformations is of practical importance for finding good network architectures. By incorporating probabilistic components at the output level, the predictive uncertainty is allowed. We illustrate this idea by comparing plain Gaussian processes (GP) with partial least squares + Gaussian process (PLS + GP) and deep learning + Gaussian process (DL + GP).This article is categorized under:
Statistical Learning and Exploratory Methods of the Data Sciences > Deep Learning
Reference79 articles.
1. Attentive state‐space modeling of disease progression;Alaa A. M.;Advances in Neural Information Processing Systems,2019
2. Bahdanau D. Cho K. &Bengio Y.(2014).Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473.
3. Binois M. &Wycoff N.(2021).A survey on high‐dimensional Gaussian process modeling with application to Bayesian optimization. arXiv preprint arXiv:2111.05040.
4. Neural Networks for Pattern Recognition