Scale-Invariant Scale-Channel Networks: Deep Networks That Generalise to Previously Unseen Scales-Reference-Cited by-同舟云学术

Scale-Invariant Scale-Channel Networks: Deep Networks That Generalise to Previously Unseen Scales

Published:2022-04-11 Issue:5 Volume:64 Page:506-536
ISSN:0924-9907
Container-title:Journal of Mathematical Imaging and Vision
language:en
Short-container-title:J Math Imaging Vis

Author:

Jansson Ylva^ORCID,Lindeberg Tony^ORCID

Abstract

AbstractThe ability to handle large scale variations is crucial for many real-world visual tasks. A straightforward approach for handling scale in a deep network is to process an image at several scales simultaneously in a set of scale channels. Scale invariance can then, in principle, be achieved by using weight sharing between the scale channels together with max or average pooling over the outputs from the scale channels. The ability of such scale-channel networks to generalise to scales not present in the training set over significant scale ranges has, however, not previously been explored. In this paper, we present a systematic study of this methodology by implementing different types of scale-channel networks and evaluating their ability to generalise to previously unseen scales. We develop a formalism for analysing the covariance and invariance properties of scale-channel networks, including exploring their relations to scale-space theory, and exploring how different design choices, unique to scaling transformations, affect the overall performance of scale-channel networks. We first show that two previously proposed scale-channel network designs, in one case, generalise no better than a standard CNN to scales not present in the training set, and in the second case, have limited scale generalisation ability. We explain theoretically and demonstrate experimentally why generalisation fails or is limited in these cases. We then propose a new type of foveated scale-channel architecture, where the scale channels process increasingly larger parts of the image with decreasing resolution. This new type of scale-channel network is shown to generalise extremely well, provided sufficient image resolution and the absence of boundary effects. Our proposed FovMax and FovAvg networks perform almost identically over a scale range of 8, also when training on single-scale training data, and do also give improved performance when learning from data sets with large scale variations in the small sample regime.

Funder

Vetenskapsrådet

Publisher

Springer Science and Business Media LLC

Subject

Applied Mathematics,Geometry and Topology,Computer Vision and Pattern Recognition,Condensed Matter Physics,Modeling and Simulation,Statistics and Probability

Link

https://link.springer.com/content/pdf/10.1007/s10851-022-01082-2.pdf

Reference122 articles.

1. Biederman, I., Cooper, E.E.: Size invariance in visual object priming. J. Exp. Physiol. Hum. Percept. Perform. 18, 121–133 (1992)

2. Logothetis, N.K., Pauls, J., Poggio, T.: Shape representation in the inferior temporal cortex of monkeys. Curr. Biol. 5, 552–563 (1995)

3. Ito, M., Tamura, H., Fujita, I., Tanaka, K.: Size and position invariance of neuronal responses in monkey inferotemporal cortex. J. Neurophysiol. 73, 218–226 (1995)

4. Furmanski, C.S., Engel, S.A.: Perceptual learning in object recognition: object specificity and size invariance. Vis. Res. 40, 473–484 (2000)

5. Hung, C.P., Kreiman, G., Poggio, T., DiCarlo, J.J.: Fast readout of object indentity from macaque inferior temporal cortex. Science 310, 863–866 (2005)

Cited by 6 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Multi-Scale Geo-Localization Based on Local Similarity Area Distance Measurement Method;IGARSS 2023 - 2023 IEEE International Geoscience and Remote Sensing Symposium;2023-07-16

2. Just a Matter of Scale? Reevaluating Scale Equivariance in Convolutional Neural Networks;2023 International Joint Conference on Neural Networks (IJCNN);2023-06-18

3. Covariance properties under natural image transformations for the generalised Gaussian derivative model for visual receptive fields;Frontiers in Computational Neuroscience;2023-06-15

4. Internally generated time in the rodent hippocampus is logarithmically compressed;eLife;2022-10-17

5. Computer vision models for comparing spatial patterns: understanding spatial scale;International Journal of Geographical Information Science;2022-07-27