1. Deep learning: a comprehensive overview on techniques, taxonomy, applications and research directions;Sarker;SN Comput. Sci.,2021
2. Finding flatter minima with SGD;Jastrzebski,2018
3. Greedy layer-wise training of deep networks;Bengio;Adv. Neural Inf. Process. Syst.,2006
4. Batch normalization: Accelerating deep network training by reducing internal covariate shift;Ioffe,2015
5. Layer normalization;Ba,2016