Training Neural Networks by Time-Fractional Gradient Descent-Reference-Cited by-同舟云学术

Training Neural Networks by Time-Fractional Gradient Descent

Published:2022-09-26 Issue:10 Volume:11 Page:507
ISSN:2075-1680
Container-title:Axioms
language:en
Short-container-title:Axioms

Author:

Xie Jingyi,Li Sirui^ORCID

Abstract

Motivated by the weighted averaging method for training neural networks, we study the time-fractional gradient descent (TFGD) method based on the time-fractional gradient flow and explore the influence of memory dependence on neural network training. The TFGD algorithm in this paper is studied via theoretical derivations and neural network training experiments. Compared with the common gradient descent (GD) algorithm, the optimization effect of the time-fractional gradient descent algorithm is significant when the value of fractional α is close to 1, under the condition of appropriate learning rate η. The comparison is extended to experiments on the MNIST dataset with various learning rates. It is verified that the TFGD has potential advantages when the fractional α nears 0.95∼0.99. This suggests that the memory dependence can improve training performance of neural networks.

Funder

the Growth Foundation for Youth Science and Technology Talent of Educational Commission of Guizhou Province of China

Publisher

MDPI AG

Subject

Geometry and Topology,Logic,Mathematical Physics,Algebra and Number Theory,Analysis

Link

https://www.mdpi.com/2075-1680/11/10/507/pdf

Reference23 articles.

1. Optimization methods for large-scale machine learning;SIAM Rev.,2018

2. Bottou, L. (2010, January 22–27). Large-scale machine learning with stochastic gradient descent. Proceedings of the International Conference on Computational Statistics, Paris, France.

3. Hardt, M., Recht, B., and Singer, Y. (2016, January 19–24). Train faster, generalize better: Stability of stochastic gradient descent. Proceedings of the International Conference on Machine Learning, New York, NY, USA.

4. Acceleration of stochastic approximation by averaging;SIAM J. Control. Optim.,1992

5. Zinkevich, M. (2003, January 21–24). Online convex programming and generalized infinitesimal gradient ascent. Proceedings of the 20th International Conference on Machine Learning (ICML-03), Washington, DC, USA.

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. LBM-MHD Data-Driven Approach to Predict Rayleigh–Bénard Convective Heat Transfer by Levenberg–Marquardt Algorithm;Axioms;2023-02-13