1. Wu S, Dimakis A G, and Sanghavi S, Learning distributions generated by one-layer ReLU networks, 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada, 2019.
2. Wang Y, Liu Y T, and Ma Z M, The scale-invariant space for attention layer in neural network, Neurocomputing, 2020, 392: 1–10.
3. Neeyshabur B, Salakhutdinov R R, and Srebro N, Path-sgd: Path normalized optimization in deep neural networks, NIPS 15 Proceedings of the 28th International Conference on Neural Information Processing Systems, 2015, 2422–2430.
4. Zheng S X, Meng Q, Zhang H S, et al., Capacity control of ReLU neural networks by basis-path norm, Thirty-third AAAI Conference on Artificial Intelligence (AAAI2019), 2019.
5. Meng Q, Zheng S X, Zhang H S, et al., G-SGD: Optimizing ReLU neural networks in its positively scale-invariant space, International Conference of Learning Representations (ICLR2019), 2019.