1. 1) J.Carreira and A.Zisserman : “Quo vadis, action recognition? a new model and the kinetics dataset”, Proceedings of the IEEE onference on computer vision and pattern Recognition, pp.6299-6308 (2017)
2. 2) K.He, X.Zhang, S.Ren and J.Sun : “Deep residual learning for image recognition”, Proceedings of the IEEE conference on computer vision and pattern recognition, pp.770-778 (2016)
3. 3) X.Wang, R.Girshick, A.Gupta, K.He : “Non-local neural networks”, Proceedings of the IEEE conference on computer vision and pattern recognition, pp.7794-7803 (2018)
4. 4) A.Vaswani, N.Shazeer, N.Parmar, J.Uszkoreit, L.Jones, A.Gomez, L.Kaiser and I.Polosukhin : “Attention is all you need”, Proceedings of the advances in neural information processing systems, vol.30 (2017)
5. 5) A.Dosovitskiy, L.Beyer, A.Kolesnikov, D.Weissenborn, X.Zhai, T.Unterthiner, M.Dehghani, M.Minderer, G.Heigold, S.Gelly, J.Uszkoreit and N.Houlsby : “An image is worth 16x16 words: transformers for image recognition at scale”, Proceedings of the international conference on learning representations (2021)