Animal Pose Tracking: 3D Multimodal Dataset and Token-based Pose Optimization-Reference-Cited by-同舟云学术

Animal Pose Tracking: 3D Multimodal Dataset and Token-based Pose Optimization

Published:2022-11-23 Issue:2 Volume:131 Page:514-530
ISSN:0920-5691
Container-title:International Journal of Computer Vision
language:en
Short-container-title:Int J Comput Vis

Author:

Patel Mahir^ORCID,Gu Yiwen^ORCID,Carstensen Lucas C.^ORCID,Hasselmo Michael E.,Betke Margrit

Abstract

AbstractAccurate tracking of the 3D pose of animals from video recordings is critical for many behavioral studies, yet there is a dearth of publicly available datasets that the computer vision community could use for model development. We here introduce the Rodent3D dataset that records animals exploring their environment and/or interacting with each other with multiple cameras and modalities (RGB, depth, thermal infrared). Rodent3D consists of 200 min of multimodal video recordings from up to three thermal and three RGB-D synchronized cameras (approximately 4 million frames). For the task of optimizing estimates of pose sequences provided by existing pose estimation methods, we provide a baseline model called OptiPose. While deep-learned attention mechanisms have been used for pose estimation in the past, with OptiPose, we propose a different way by representing 3D poses as tokens for which deep-learned context models pay attention to both spatial and temporal keypoint patterns. Our experiments show how OptiPose is highly robust to noise and occlusion and can be used to optimize pose sequences provided by state-of-the-art models for animal pose estimation.

Funder

Office of Naval Research

Publisher

Springer Science and Business Media LLC

Subject

Artificial Intelligence,Computer Vision and Pattern Recognition,Software

Link

https://link.springer.com/content/pdf/10.1007/s11263-022-01714-5.pdf

Reference54 articles.

1. Alexander, A. S., Carstensen, L. C., Hinman, J. R., Raudies, F., Chapman, G. W., & Hasselmo, M. E. (2020). Egocentric boundary vector tuning of the retrosplenial cortex. Science Advances, 6(8), eaaz2322.

2. Biggs, B., Boyne, O., Charles, J., Fitzgibbon, A., & Cipolla, R. (2020). Who left the dogs out: 3D animal reconstruction with expectation maximization in the loop. In 16th European conference on computer vision, Glasgow UK August 23 to 28, 2020, Proceedings Part XI

3. Breslav, M., Hedrick, T. L., Sclaroff, S., & Betke, M. (2016). Discovering useful parts for pose estimation in sparesly annotated datasets. In Proceedings of the IEEE winter conference on applications of computer vision (WACV), Lake Placid, NY