Learning Scene Representations for Human-assistive Displays Using Self-attention Networks

Author:

Ruiz-Serra Jaime1ORCID,White Jack2ORCID,Petrie Stephen2ORCID,Kameneva Tatiana3ORCID,McCarthy Chris2ORCID

Affiliation:

1. Swinburne University of Technology, Hawthorn, Australia and The University of Sydney, Darlington, Australia

2. Swinburne University of Technology, Hawthorn, Australia

3. Swinburne University of Technology, Hawthorn, Australia and The University of Melbourne, Parkville, Australia

Abstract

Video-see-through (VST) augmented reality (AR) is widely used to present novel augmentative visual experiences by processing video frames for viewers. Among VST AR systems, assistive vision displays aim to compensate for low vision or blindness, presenting enhanced visual information to support activities of daily living for the vision impaired/deprived. Despite progress, current assistive displays suffer from a visual information bottleneck, limiting their functional outcomes compared to healthy vision. This motivates the exploration of methods to selectively enhance and augment salient visual information. Traditionally, vision processing pipelines for assistive displays rely on hand-crafted, single-modality filters, lacking adaptability to time-varying and environment-dependent needs. This article proposes the use of Deep Reinforcement Learning (DRL) and Self-attention (SA) networks as a means to learn vision processing pipelines for assistive displays. SA networks selectively attend to task-relevant features, offering a more parameter—and compute-efficient approach to RL-based task learning. We assess the feasibility of using SA networks in a simulation-trained model to generate relevant representations of real-world states for navigation with prosthetic vision displays. We explore two prosthetic vision applications, vision-to-auditory encoding, and retinal prostheses, using simulated phosphene visualisations. This article introduces SA-px, a general-purpose vision processing pipeline using self-attention networks, and SA-phos, a display-specific formulation targeting low-resolution assistive displays. We present novel scene visualisations derived from SA image patches importance rankings to support mobility with prosthetic vision devices. To the best of our knowledge, this is the first application of self-attention networks to the task of learning vision processing pipelines for prosthetic vision or assistive displays.

Funder

SUT

National Collaborative Research Infrastructure Strategy

Publisher

Association for Computing Machinery (ACM)

Reference58 articles.

1. Sos S. Agaian, Karen Panetta, and Artyom M. Grigoryan. 2000. A new measure of image enhancement. In Proceedings of the IASTED International Conference on Signal Processing & Communication. Citeseer, 19–22.

2. An update on retinal prostheses

3. An overview of vision processing in implantable prosthetic vision

4. Vision function testing for a suprachoroidal retinal prosthesis: effects of image filtering

5. Enhancing object contrast using augmented depth improves mobility in patients implanted with a retinal prosthesis;Barnes Nick M.;Investigative Ophthalmology & Visual Science,2015

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3