1. Group normalization;wu;Proceedings of the European Conference on Computer Vision (ECCV),2018
2. Gaussian error linear units (GELUs);hendrycks,2016
3. SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition
4. Deep networks with stochastic depth;huang;European Conference on Computer Vision,2016
5. Reducing transformer depth on demand with structured dropout;fan,2019