Abstract
AbstractWe assess whether deep convolutional networks (DCN) can account for a most fundamental property of human vision: detection/discrimination of elementary image elements (bars) at different contrast levels. The human visual process can be characterized to varying degrees of ‘depth’, ranging from percentage of correct detections to detailed tuning and operating characteristics of the underlying perceptual mechanism. We challenge deep networks with the same stimuli/tasks used with human observers and apply equivalent characterization of the stimulus-response coupling. In general, we find that popular DCN architectures do not account for signature properties of the human process. For shallow depth of characterization, some variants of network-architecture/training-protocol produce human-like trends; however, richer empirical descriptors expose glaring discrepancies. These results urge caution in assessing whether neural networks do or do not capture human behaviour: ultimately, our ability to assess ‘success’ in this area can only be as good as afforded by the depth of behavioural characterization against which the network is evaluated. We propose a novel set of metrics/protocols that impose stringent constraints on the evaluation of DCN behaviour as adequate approximation of biological processes.
Publisher
Cold Spring Harbor Laboratory