1. Baradel F, Wolf C, Mille J (2017) Pose-conditioned spatiotemporal attention for human action recognition. CoRR abs/1703.10106, 2017. 7
2. Cao Z, Simon T, Wei S-E, Sheikh Y (2017) Realtime multi-person 2d pose estimation using part affinity fields. In: Computer Vision and Pattern Recognition (CVPR), 2017 9, 10
3. Carreira J, Zisserman A (2017) Quovadis, action recognition? a new model and the kinetics dataset. In: CVPR, 2017. 1, 3, 5, 7, 8
4. Du Y, Wang W, Wang L (2015) Hierarchical recurrent neural network for skeleton based action recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1110–1118
5. Gu J, Wang G, Chen T (2016) Recurrent highway networks with language cnn for image captioning. arXiv preprint arXiv:1612.07086