Abstract
ABSTRACTWhat underlies the emergence of cortex-aligned representations in deep neural network models of vision? The success of widely varied architectures has motivated the prevailing hypothesis that large-scale pre-training is the primary factor underlying the similarities between brains and neural networks. Here, we challenge this view by revealing the role of architectural inductive biases in models with minimal training. We examined networks with varied architectures but no pre-training and quantified their ability to predict image representations in the visual cortices of both monkeys and humans. We found that cortex-aligned representations emerge in convolutional architectures that combine two key manipulations of dimensionality: compression in the spatial domain and expansion in the feature domain. We further show that the inductive biases of convolutional architectures are critical for obtaining performance gains from feature expansion—dimensionality manipulations were relatively ineffective in other architectures and in convolutional models with targeted lesions. Our findings suggest that the architectural constraints of convolutional networks are sufficiently close to the constraints of biological vision to allow many aspects of cortical visual representation to emerge even before synaptic connections have been tuned through experience.
Publisher
Cold Spring Harbor Laboratory