Understanding 3D vision as a policy network-Reference-Cited by-同舟云学术

Understanding 3D vision as a policy network

Published:2022-12-13 Issue:1869 Volume:378 Page:
ISSN:0962-8436
Container-title:Philosophical Transactions of the Royal Society B: Biological Sciences
language:en
Short-container-title:Phil. Trans. R. Soc. B

Author:

Glennerster Andrew¹^ORCID

Affiliation:

1. School of Psychology and Clinical Language Sciences, University of Reading, RG6 6AL Reading, UK

Abstract

It is often assumed that the brain builds 3D coordinate frames, in retinal coordinates (with binocular disparity giving the third dimension), head-centred, body-centred and world-centred coordinates. This paper questions that assumption and begins to sketch an alternative based on, essentially, a set of reflexes. A ‘policy network’ is a term used in reinforcement learning to describe the set of actions that are generated by an agent depending on its current state. This is an untypical starting point for describing 3D vision, but a policy network can serve as a useful representation both for the 3D layout of a scene and the location of the observer within it. It avoids 3D reconstruction of the type used in computer vision but is similar to recent representations for navigation generated through reinforcement learning. A policy network for saccades (pure rotations of the camera/eye) is a logical starting point for understanding (i) an ego-centric representation of space (e.g. Marr’s (Marr 1982 Vision: a computational investigation into the human representation and processing of visual information ) 2 1 2 -D sketch) and (ii) a hierarchical, compositional representation for navigation. The potential neural implementation of policy networks is straightforward; a network with a large range of sensory and task-related inputs such as the cerebellum would be capable of implementing this input/output function. This is not the case for 3D coordinate transformations in the brain: no neurally implementable proposals have yet been put forward that could carry out a transformation of a visual scene from retinal to world-based coordinates. Hence, if the representation underlying 3D vision can be described as a policy network (in which the actions are either saccades or head translations), this would be a significant step towards a neurally plausible model of 3D vision. This article is part of the theme issue ‘New approaches to 3D vision’.

Funder

Arts and Humanities Research Council

Engineering and Physical Sciences Research Council

Publisher

The Royal Society

Subject

General Agricultural and Biological Sciences,General Biochemistry, Genetics and Molecular Biology

Link

https://royalsocietypublishing.org/doi/pdf/10.1098/rstb.2021.0448

Reference55 articles.

1. Navigation and Acquisition of Spatial Knowledge in a Virtual Maze

2. Image-based object recognition in man, monkey and machine

3. Stereoscopic Depth Constancy Depends on the Subject's Task

4. Affine structure from motion

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Binocular receptive-field construction in the primary visual cortex;Current Biology;2024-06

2. Scene context automatically drives predictions of object transformations;Cognition;2023-09

3. New Approaches to 3D Vision;Philosophical Transactions of the Royal Society B: Biological Sciences;2022-12-13