Abstract
AbstractBehavioral and neural evidence suggests that the fundamental unit of attention selection for humans is a complete object representation. However, the mechanism underlying object-based attention remains poorly understood. One open question about object-based attention is how high-level object representations can effectively guide bottom-up processing of low-level visual features. Here, we propose that the visual system attempts to continuously generate reconstructions of the objects in an input and that these reconstructions serve as a mechanism for object-based attention. Building on auto-encoder neural networks, we show that reconstructing the object’s appearance and location from its abstract representation enables the visual system to better bind features and locations to create hypothesized objects, thus providing a mechanism for generating top-down attentional biases and selectively routing low-level features of specific objects. We evaluated our model using the MNIST-C (handwritten digits under corruptions) and ImageNet-C (real-world objects under corruptions) datasets, which are challenging recognition tasks requiring top-down attention to effectively filter out noise and occlusion. Our model not only demonstrated superior performance on these tasks, but also better accounted for human behavioral reaction times and error patterns than a standard feedforward Convolutional Neural Network. Our work suggests that object reconstruction is a biologically plausible and effective object-based attention mechanism that can endow humans with robust object perception and recognition abilities.
Publisher
Cold Spring Harbor Laboratory