Understanding 3D vision as a policy network

Author:

Glennerster Andrew1ORCID

Affiliation:

1. School of Psychology and Clinical Language Sciences, University of Reading, RG6 6AL Reading, UK

Abstract

It is often assumed that the brain builds 3D coordinate frames, in retinal coordinates (with binocular disparity giving the third dimension), head-centred, body-centred and world-centred coordinates. This paper questions that assumption and begins to sketch an alternative based on, essentially, a set of reflexes. A ‘policy network’ is a term used in reinforcement learning to describe the set of actions that are generated by an agent depending on its current state. This is an untypical starting point for describing 3D vision, but a policy network can serve as a useful representation both for the 3D layout of a scene and the location of the observer within it. It avoids 3D reconstruction of the type used in computer vision but is similar to recent representations for navigation generated through reinforcement learning. A policy network for saccades (pure rotations of the camera/eye) is a logical starting point for understanding (i) an ego-centric representation of space (e.g. Marr’s (Marr 1982 Vision: a computational investigation into the human representation and processing of visual information ) 2 1 2 -D sketch) and (ii) a hierarchical, compositional representation for navigation. The potential neural implementation of policy networks is straightforward; a network with a large range of sensory and task-related inputs such as the cerebellum would be capable of implementing this input/output function. This is not the case for 3D coordinate transformations in the brain: no neurally implementable proposals have yet been put forward that could carry out a transformation of a visual scene from retinal to world-based coordinates. Hence, if the representation underlying 3D vision can be described as a policy network (in which the actions are either saccades or head translations), this would be a significant step towards a neurally plausible model of 3D vision. This article is part of the theme issue ‘New approaches to 3D vision’.

Funder

Arts and Humanities Research Council

Engineering and Physical Sciences Research Council

Publisher

The Royal Society

Subject

General Agricultural and Biological Sciences,General Biochemistry, Genetics and Molecular Biology

Cited by 2 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Scene context automatically drives predictions of object transformations;Cognition;2023-09

2. New Approaches to 3D Vision;Philosophical Transactions of the Royal Society B: Biological Sciences;2022-12-13

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3