SyDog-Video: A Synthetic Dog Video Dataset for Temporal Pose Estimation-Reference-Cited by-同舟云学术

SyDog-Video: A Synthetic Dog Video Dataset for Temporal Pose Estimation

Published:2023-12-29 Issue: Volume: Page:
ISSN:0920-5691
Container-title:International Journal of Computer Vision
language:en
Short-container-title:Int J Comput Vis

Author:

Shooter Moira^ORCID,Malleson Charles,Hilton Adrian

Abstract

AbstractWe aim to estimate the pose of dogs from videos using a temporal deep learning model as this can result in more accurate pose predictions when temporary occlusions or substantial movements occur. Generally, deep learning models require a lot of data to perform well. To our knowledge, public pose datasets containing videos of dogs are non existent. To solve this problem, and avoid manually labelling videos as it can take a lot of time, we generated a synthetic dataset containing 500 videos of dogs performing different actions using Unity3D. Diversity is achieved by randomising parameters such as lighting, backgrounds, camera parameters and the dog’s appearance and pose. We evaluate the quality of our synthetic dataset by assessing the model’s capacity to generalise to real data. Usually, networks trained on synthetic data perform poorly when evaluated on real data, this is due to the domain gap. As there was still a domain gap after improving the quality of the synthetic dataset and inserting diversity, we bridged the domain gap by applying 2 different methods: fine-tuning and using a mixed dataset to train the network. Additionally, we compare the model pre-trained on synthetic data with models pre-trained on a real-world animal pose datasets. We demonstrate that using the synthetic dataset is beneficial for training models with (small) real-world datasets. Furthermore, we show that pre-training the model with the synthetic dataset is the go to choice rather than pre-training on real-world datasets for solving the pose estimation task from videos of dogs.

Funder

Leverhulme Trust

Engineering and Physical Sciences Research Council

Publisher

Springer Science and Business Media LLC

Subject

Artificial Intelligence,Computer Vision and Pattern Recognition,Software

Link

https://link.springer.com/content/pdf/10.1007/s11263-023-01946-z.pdf

Reference59 articles.

1. Abu-El-Haija, S., Kothari, N., Lee, J., Natsev, P., Toderici, G., Varadarajan, B., & Vijayanarasimhan, S. (2016). Youtube-8m: A large-scale video classification benchmark. CoRR. Retrieved from arXiv:1609.08675

2. Adobe. (2022). Mixamo get animated. Animate 3d characters for games, film, and more. https://www.mixamo.com/

3. Akiba, T., Sano, S., Yanase, T., Ohta, T., & Koyama, M. (2019). Optuna: A next-generation hyperparameter optimization framework. CoRR. Retrieved from arXiv:1907.10902

4. Alhaija, H. A., Mustikovela, S. K., Mescheder, L. M., Geiger, A., & Rother, C. (2017). Augmented reality meets computer vision: Efficient data generation for urban driving scenes. CoRR. Retrieved from arXiv:1708.01566

5. Biggs, B., Boyne, O., Charles, J., Fitzgibbon, A., & Cipolla, R. (2020). Who left the dogs out? 3D animal reconstruction with expectation maximization in the loop.