PAV-SOD: A New Task towards Panoramic Audiovisual Saliency Detection

Author:

Zhang Yi1ORCID,Chao Fang-Yi2ORCID,Hamidouche Wassim3ORCID,Deforges Olivier3ORCID

Affiliation:

1. Univ Rennes, INSA Rennes, CNRS, IETR (UMR 6164), France

2. Trinity College Dublin, Ireland

3. Univ Rennes, INSA Rennes, CNRS,IETR (UMR 6164), France

Abstract

Object-level audiovisual saliency detection in 360° panoramic real-life dynamic scenes is important for exploring and modeling human perception in immersive environments, also for aiding the development of virtual, augmented, and mixed reality applications in fields such as education, social network, entertainment, and training. To this end, we propose a new task, p anoramic a udio v isual s alient o bject d etection, ( PAV-SOD 1 ), which aims to segment the objects grasping most of the human attention in 360° panoramic videos reflecting real-life daily scenes. To support the task, we collect PAVS10K , the first p anoramic video dataset for a udio v isual s alient object detection, which consists of 67 4K-resolution equirectangular videos with per-video labels including hierarchical scene categories and associated attributes depicting specific challenges for conducting PAV-SOD , and 10,465 uniformly sampled video frames with manually annotated object-level and instance-level pixel-wise masks. The coarse-to-fine annotations enable multi-perspective analysis regarding PAV-SOD  modeling. We further systematically benchmark 13 state-of-the-art salient object detection (SOD)/video object segmentation (VOS) methods based on our PAVS10K . Besides, we propose a new baseline network, which takes advantage of both visual and audio cues of 360° video frames by using a new conditional variational auto-encoder (CVAE). Our C VAE-based a udio v isual net work, namely, CAV-Net , consists of a spatial-temporal visual segmentation network, a convolutional audio-encoding network, and audiovisual distribution estimation modules. As a result, our CAV-Net  outperforms all competing models and is able to estimate the aleatoric uncertainties within PAVS10K . With extensive experimental results, we gain several findings about PAV-SOD  challenges and insights towards PAV-SOD  model interpretability. We hope that our work could serve as a starting point for advancing SOD towards immersive media.

Publisher

Association for Computing Machinery (ACM)

Subject

Computer Networks and Communications,Hardware and Architecture

Reference145 articles.

1. A Dataset of Head and Eye Movements for 360 Degree Images

2. Saliency in VR: How Do People Explore Virtual Environments?

3. 360-Degree Video Head Movement Dataset

4. Predicting head movement in panoramic video: A deep reinforcement learning approach;Xu Mai;IEEE Trans. Neural Netw. Learn. Syst.,2018

5. Ziheng Zhang, Yanyu Xu, Jingyi Yu, and Shenghua Gao. 2018. Saliency detection in 360 videos. In Proceedings of the European Conference on Computer Vision (ECCV). 488–503.

Cited by 2 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. D-SAV360: A Dataset of Gaze Scanpaths on 360° Ambisonic Videos;IEEE Transactions on Visualization and Computer Graphics;2023-11

2. A Survey on 360° Images and Videos in Mixed Reality: Algorithms and Applications;Journal of Computer Science and Technology;2023-05-30

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3