Abstract
This paper lists twelve conditions that must be fulfilled by a satisfactory theory of visual pattern recognition in animals and man : such a theory must explain (1) size invariance, (2) position invariance, (3) brightness invariance, (4) the equivalence of outline and filled-in shapes, (5) lack of invariance under most rotations, (6) the known confusions made between patterns, (7) animals’ ability to disregard jitter, (8) the human ability to segment shapes in different ways, (9) the way in which man recognizes complex scenes without recognizing individual details of the scene, (10) perceptual learning, (11) the way in which the brain takes advantage of the redundancy of the visual environment; (12) finally any theory must be consistent with the physiological evidence. The outlines of a theory meeting these conditions are put forward. The model has three parts : (1) A processor that extracts local features from the input picture preserving information about the spatial relationships between the features. (2) A mechanism that induces an abstract description of the output from the processor. (3) A store in which such descriptions are held. Some suggestions are made about the language in which descriptions are framed : it must contain hierarchical elements to allow for the possibility of segmenting a picture in different ways. It is shown how the properties of such a language could account for the facts listed above. In recognizing a picture the output from the store is matched to a stored descriptive rule. What we see is the descriptive rule selected to describe an input picture—we cannot respond to details of the picture not represented in the rule. Since, however, many different rules describe the same input picture, animals and man respond to different aspects of the same picture on different occasions of presentation.
Cited by
200 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献