1. Ba LJ, Kiros JR, Hinton GE (2016) Layer normalization. In: Advances in neural information processing systems - deep learning symposium, Barcelona, Spain, 8-Dec.-2016
2. Bonev B, Kurth T, Hundt C, Pathak J, Baust M, Kashinath K, Anandkumar A (2023) Spherical Fourier neural operators: Learning stable dynamics on the sphere. In: Proceedings of the 40th international conference on machine learning, ICML 2023, Hawaii, USA, July 23–29, 2023, pp 2806–2823
3. Chang L, Chen W, Huang J, Bin C, Wang W (2021) Exploiting multi-attention network with contextual influence for point-of-interest recommendation. Appl Intell 51:1904–1917
4. Devlin J, Chang M, Lee K, Toutanova K (2019) BERT: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, NAACL-HLT 2019, Minneapolis, USA, June 2–7, 2019, pp 4171–4186
5. Dong Y, Wang Z, Du J, Fang W, Li L (2023) Attention-based hierarchical denoised deep clustering network. World Wide Web 26(1):441–459