Author:
Fadaei Amir Hosein,Dehaqani Mohammad-Reza A.
Abstract
AbstractTraditionally, vision models have predominantly relied on spatial features extracted from static images, deviating from the continuous stream of spatiotemporal features processed by the brain in natural vision. While numerous video-understanding models have emerged, incorporating videos into image-understanding models with spatiotemporal features has been limited. Drawing inspiration from natural vision, which exhibits remarkable resilience to input changes, our research focuses on the development of a brain-inspired model for vision understanding trained with videos. Our findings demonstrate that models that train on videos instead of still images and include temporal features become more resilient to various alternations on input media.
Publisher
Springer Science and Business Media LLC
Reference48 articles.
1. Goodale, M. A. & Milner, A. D. Separate visual pathways for perception and action. Trends Neurosci. 15, 20–25 (1992).
2. Lamme, V. A. F., Super, H. & Spekreijse, H. Feedforward, horizontal, and feedback processing in the visual cortex. Curr. Opin. Neurobiol. 8, 529–535 (1998).
3. Bear, M., Connors, B. & Paradiso, M. A. Neuroscience: Exploring the Brain, Enhanced Edition: Exploring the Brain. (Jones & Bartlett Learning, 2020).
4. Buffalo, E. A., Fries, P., Landman, R., Liang, H. & Desimone, R. A backward progression of attentional effects in the ventral stream. Proc. Nat. Acad. Sci. 107, 361–365 (2010).
5. de Haan, E. H. F. & Cowey, A. On the usefulness of ‘what’and ‘where’pathways in vision. Trends Cogn. Sci. 15, 460–466 (2011).