1. Ahn, D., Kim, S., Hong, H., Ko, B.C., 2023. Star-transformer: a spatio-temporal cross attention transformer for human action recognition. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. pp. 3330–3339.
2. Layer normalization;Ba;stat,2016
3. Berthelot, D., Carlini, N., Cubuk, E.D., Kurakin, A., Zhang, H., Raffel, C., Sohn, K., 2020. Remixmatch: Semi-supervised learning with distribution alignment and augmentation anchoring. In: 8th International Conference on Learning Representations. ICLR 2020, Addis Ababa, Ethiopia.
4. Unsupervised learning of visual features by contrasting cluster assignments;Caron;Adv. Neural Inf. Process. Syst.,2020
5. A simple framework for contrastive learning of visual representations;Chen,2020