1. Deep highresolution representation learning for human pose estimation;sun;Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,2019
2. An image is worth 16x16 words: Transformers for image recognition at scale;dosovitskiy;arXiv preprint arXiv 2010 11419,2020
3. TesseTrack: End-to-End Learnable Multi-Person Articulated 3D Pose Tracking
4. Simple baselines for human pose estimation and tracking;xiao;Proceedings of the European Conference on Computer Vision,2018
5. Vitpose: Simple vision transformer baselines for human pose estimation;xu;arXiv preprint arXiv 2204 12484,2022