Affiliation:
1. Swinburne University of Technology, Hawthorn, Australia and The University of Sydney, Darlington, Australia
2. Swinburne University of Technology, Hawthorn, Australia
3. Swinburne University of Technology, Hawthorn, Australia and The University of Melbourne, Parkville, Australia
Abstract
Video-see-through (VST) augmented reality (AR) is widely used to present novel augmentative visual experiences by processing video frames for viewers. Among VST AR systems, assistive vision displays aim to compensate for low vision or blindness, presenting enhanced visual information to support activities of daily living for the vision impaired/deprived. Despite progress, current assistive displays suffer from a visual information bottleneck, limiting their functional outcomes compared to healthy vision. This motivates the exploration of methods to selectively enhance and augment salient visual information. Traditionally, vision processing pipelines for assistive displays rely on hand-crafted, single-modality filters, lacking adaptability to time-varying and environment-dependent needs. This article proposes the use of Deep Reinforcement Learning (DRL) and Self-attention (SA) networks as a means to learn vision processing pipelines for assistive displays. SA networks selectively attend to task-relevant features, offering a more parameter—and compute-efficient approach to RL-based task learning. We assess the feasibility of using SA networks in a simulation-trained model to generate relevant representations of real-world states for navigation with prosthetic vision displays. We explore two prosthetic vision applications, vision-to-auditory encoding, and retinal prostheses, using simulated phosphene visualisations. This article introduces SA-px, a general-purpose vision processing pipeline using self-attention networks, and SA-phos, a display-specific formulation targeting low-resolution assistive displays. We present novel scene visualisations derived from SA image patches importance rankings to support mobility with prosthetic vision devices. To the best of our knowledge, this is the first application of self-attention networks to the task of learning vision processing pipelines for prosthetic vision or assistive displays.
Funder
SUT
National Collaborative Research Infrastructure Strategy
Publisher
Association for Computing Machinery (ACM)
Reference58 articles.
1. Sos S. Agaian, Karen Panetta, and Artyom M. Grigoryan. 2000. A new measure of image enhancement. In Proceedings of the IASTED International Conference on Signal Processing & Communication. Citeseer, 19–22.
2. An update on retinal prostheses
3. An overview of vision processing in implantable prosthetic vision
4. Vision function testing for a suprachoroidal retinal prosthesis: effects of image filtering
5. Enhancing object contrast using augmented depth improves mobility in patients implanted with a retinal prosthesis;Barnes Nick M.;Investigative Ophthalmology & Visual Science,2015