Author:
Sun Jennifer J.,Zhou Hao,Zhao Long,Yuan Liangzhe,Seybold Bryan,Hendon David,Schroff Florian,Ross David A.,Adam Hartwig,Hu Bo,Liu Ting
Abstract
AbstractComputational approaches leveraging computer vision and machine learning have transformed the quantification of animal behavior from video. However, existing methods often rely on task-specific features or models, which struggle to generalize across diverse datasets and tasks. Recent advances in machine learning, particularly the emergence of vision foundation models, i.e., large-scale models pre-trained on massive, diverse visual repositories, offers a way to tackle these challenges. Here, we investigate the potential offrozenvideo foundation models across a range of behavior analysis tasks, including classification, retrieval, and localization. We use a single, frozen model to extract general-purpose representations from video data, and perform extensive evaluations on diverse open-sourced animal behavior datasets. Our results demonstrate that features with minimal adaptation from foundation models achieve competitive performance compared to existing methods specifically designed for each dataset, across species, behaviors, and experimental contexts. This highlights the potential of frozen video foundation models as a powerful and accessible backbone for automated behavior analysis, with the ability to accelerate research across diverse fields from neuroscience, to ethology, and to ecology.
Publisher
Cold Spring Harbor Laboratory
Reference77 articles.
1. Toward a Science of Computational Ethology
2. Bommasani, R. , et al. On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258 (2021).
3. Achiam, J. et al. GPT-4 technical report. arXiv preprint arXiv:2303.08774 (2023).
4. Radford, A. , et al. Learning transferable visual models from natural language supervision. ICML (2021).
5. The mouse action recognition system (MARS) software pipeline for automated analysis of social behaviors in mice;Elife,2021