1. Cybenko, G. (1989) Approximation by superpositions of a sigmoidal function. Mathematics of Control, Signals and Systems 2(4): 303-314 https://doi.org/10.1007/BF02551274, 1435-568X, In this paper we demonstrate that finite linear combinations of compositions of a fixed, univariate function and a set of affine functionals can uniformly approximate any continuous function ofn real variables with support in the unit hypercube; only mild conditions are imposed on the univariate function. Our results settle an open question about representability in the class of single hidden layer neural networks. In particular, we show that arbitrary decision regions can be arbitrarily well approximated by continuous feedforward neural networks with only a single internal, hidden layer and any continuous sigmoidal nonlinearity. The paper discusses approximation properties of other possible types of nonlinearities that might be implemented by artificial neural networks., 01, Dec
2. Cichocki, A. and Unbehauen, Rolf (1993) Neural Networks for Optimization and Signal Processing. John Wiley & Sons, Inc., USA, From the Publisher:Artificial neural networks can be employed to solve a wide spectrum of problems in optimization, parallel computing, matrix algebra and signal processing. Taking a computational approach, this book explains how ANNs provide solutions in real time, and allow the visualization and development of new techniques and architectures. Features include a guide to the fundamental mathematics of neurocomputing, a review of neural network models and an analysis of their associated algorithms, and state-of-the-art procedures to solve optimization problems. Computer simulation programs MATLAB, TUTSIM and SPICE illustrate the validity and performance of the algorithms and architectures described. The authors encourage the reader to be creative in visualizing new approaches and detail how other specialized computer programs can evaluate performance. Each chapter concludes with a short bibliography. Illustrative worked examples, questions and problems assist self study. The authors' self-contained approach will appeal to a wide range of readers, including professional engineers working in computing, optimization, operational research, systems identification and control theory. Undergraduate and postgraduate students in computer science, electrical and electronic engineering will also find this text invaluable. In particular, the text will be ideal to supplement courses in circuit analysis and design, adaptive systems, control systems, signal processing and parallel computing., 1st, 0471930105
3. Kurt Hornik (1991) Approximation capabilities of multilayer feedforward networks. Neural Networks 4(2): 251-257 https://doi.org/10.1016/0893-6080(91)90009-T, We show that standard multilayer feedforward networks with as few as a single hidden layer and arbitrary bounded and nonconstant activation function are universal approximators with respect to Lp( μ) performance criteria, for arbitrary finite input environment measures μ, provided only that sufficiently many hidden units are available. If the activation function is continuous, bounded and nonconstant, then continuous mappings can be learned uniformly over compact input sets. We also give very general conditions ensuring that networks with sufficiently smooth activation functions are capable of arbitrarily accurate approximation to a function and its derivatives., Multilayer feedforward networks, Activation function, Universal approximation capabilities, Input environment measure, () approximation, Uniform approximation, Sobolev spaces, Smooth approximation, 0893-6080
4. Yann LeCun and Yoshua Bengio and Geoffrey Hinton (2015) Deep learning. Nature 521(7553): 436--444 https://doi.org/10.1038/nature14539, Springer Science and Business Media {LLC}, May
5. J ürgen Schmidhuber (2015) Deep learning in neural networks: An overview. Neural Networks 61: 85-117 https://doi.org/10.1016/j.neunet.2014.09.003, In recent years, deep artificial neural networks (including recurrent ones) have won numerous contests in pattern recognition and machine learning. This historical survey compactly summarizes relevant work, much of it from the previous millennium. Shallow and Deep Learners are distinguished by the depth of their credit assignment paths, which are chains of possibly learnable, causal links between actions and effects. I review deep supervised learning (also recapitulating the history of backpropagation), unsupervised learning, reinforcement learning & evolutionary computation, and indirect search for short programs encoding deep and large networks., Deep learning, Supervised learning, Unsupervised learning, Reinforcement learning, Evolutionary computation, 0893-6080