Abstract
AbstractEstimating local surface orientation (slant and tilt) is fundamental to recovering the three-dimensional structure of the environment, but it is unknown how well humans perform this task in natural scenes. Here, with a high-fidelity database of natural stereo-images with groundtruth surface orientation at each pixel, we find dramatic differences in human tilt estimation with natural and artificial stimuli. With artificial stimuli, estimates are precise and unbiased. With natural stimuli, estimates are imprecise and strongly biased. An image-computable normative model grounded in natural scene statistics predicts human bias, precision, and trial-by-trial errors without fitting parameters to the human data. These similarities suggest that the complex human performance patterns with natural stimuli are lawful, and that human visual systems have internalized local image and scene statistics to optimally infer the three-dimensional structure of the environment. The current results help generalize our understanding of human vision from the lab to the real world.
Publisher
Cold Spring Harbor Laboratory