Abstract
Abstract
at present, neural networks are becoming more and more complex, from several layers to dozens of layers or even more than 100 layers. The main advantage of deep network is that it can express very complex functions. It can learn features from different levels of abstraction, such as edge features at lower levels and complex features at higher levels.However, the use of deep networks is not always effective, because there is a very big obstacle - the disappearance of gradients: in very deep networks, gradient signals tend to approach zero very quickly, which makes the gradient descent process extremely slow.Specifically, in the process of gradient descent, the weight matrix product operation must be carried out in every step of back propagation from the last layer to the first layer, so that the gradient will drop exponentially to 0.(in rare cases, there is the problem of gradient explosion, that is, the gradient grows exponentially to the overflow in the process of propagation). Therefore, in the process of training, it will be found that with the increase of the number of layers, the rate of gradient decrease increases.Therefore, by deepening the network, although it can express any complex function, but in fact, with the increase of network layers, we are more and more difficult to train the network, until the proposal of residual network, which makes it possible to train deeper network[1].
Subject
General Physics and Astronomy
Reference9 articles.
1. CondenseNet: An Efficient DenseNet using Learned Group Convolutions;Huang,2018
2. MobileNetV2: Inverted Residuals and LinearBottlenecks;Sandler,2018
3. Xception: Deep learning with depthwise separable convolutions;Chollet,2017
4. Shufflenet: An extremely efficient convolutional neuralnetwork for mobile devices;Zhang,2018
5. Shufflenet v2: Practical guidelines for efficient cnnarchitecture design;Ma,2018
Cited by
39 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献