Affiliation:
1. Departamento de Sistemas, Universidad Autónoma Metropolitana, Azcapotzalco 02200, Mexico
2. Departamento de Ciencias Básicas, Universidad Autónoma Metropolitana, Azcapotzalco 02200, Mexico
Abstract
A novel morphing activation function is proposed, motivated by the wavelet theory and the use of wavelets as activation functions. Morphing refers to the gradual change of shape to mimic several apparently unrelated activation functions. The shape is controlled by the fractional order derivative, which is a trainable parameter to be optimized in the neural network learning process. Given the morphing activation function, and taking only integer-order derivatives, efficient piecewise polynomial versions of several existing activation functions are obtained. Experiments show that the performance of polynomial versions PolySigmoid, PolySoftplus, PolyGeLU, PolySwish, and PolyMish is similar or better than their counterparts Sigmoid, Softplus, GeLU, Swish, and Mish. Furthermore, it is possible to learn the best shape from the data by optimizing the fractional-order derivative with gradient descent algorithms, leading to the study of a more general formula based on fractional calculus to build and adapt activation functions with properties useful in machine learning.
Reference47 articles.
1. A logical calculus of the ideas immanent in nervous activity;McCulloch;Bull. Math. Biophys.,1943
2. The Perceptron: A Probabilistic Model for Information Storage and Organization in the Brain;Rosenblatt;Psychol. Rev.,1958
3. Haykin, S.S. (2009). Neural Networks and Learning Machines, Pearson Education. [3rd ed.].
4. Pascanu, R., Mikolov, T., and Bengio, Y. (2013, January 16–21). On the difficulty of training Recurrent Neural Networks. Proceedings of the IEEE International Conference on Machine Learning (ICML), Atlanta, GA, USA.
5. The Vanishing Gradient Problem During Learning Recurrent Neural Nets and Problem Solutions;Hochreiter;Int. J. Uncertain. Fuzziness Knowl.-Based Syst.,1998