1. Layer normalization;Ba Lei Jimmy;CoRR,2016
2. Lukas Bossard, Matthieu Guillaumin, and Luc Van Gool. 2014. Food-101 — Mining discriminative components with random forests. In ECCV (6), Vol. 8694. 446–461. https://data.vision.ee.ethz.ch/cvl/datasets_extra/food-101/
3. Andrew Brock, Soham De, and Samuel L. Smith. 2021. Characterizing signal propagation to close the performance gap in unnormalized ResNets. In ICLR.
4. Mauro Cettolo, Jan Niehues, Sebastian Stüker, Luisa Bentivogli, and Marcello Federico. 2014. Report on the 11th IWSLT evaluation campaign. In Proceedings of the 11th International Workshop on Spoken Language Translation: Evaluation Campaign. 2–17. https://workshop2014.iwslt.org/
5. Soham De and Samuel L. Smith. 2020. Batch normalization biases residual blocks towards the identity function in deep networks. In NeurIPS.