A hierarchical occupancy network with multi‐height attention for vision‐centric 3D occupancy prediction

Author:

Li Can12ORCID,Gao Zhi12ORCID,Lin Zhipeng3ORCID,Ye Tonghui1ORCID,Li Ziyao1ORCID

Affiliation:

1. School of Remote Sensing and Information Engineering Wuhan University Wuhan China

2. Hubei Luojia Laboratory Wuhan China

3. Department of Mechanical and Automation Engineering The Chinese University of Hong Kong Hong Kong SAR China

Abstract

AbstractThe precise geometric representation and ability to handle long‐tail targets have led to the increasing attention towards vision‐centric 3D occupancy prediction, which models the real world as a voxel‐wise model solely through visual inputs. Despite some notable achievements in this field, many prior or concurrent approaches simply adapt existing spatial cross‐attention (SCA) as their 2D–3D transformation module, which may lead to informative coupling or compromise the global receptive field along the height dimension. To overcome these limitations, we propose a hierarchical occupancy (HierOcc) network featuring our innovative height‐aware cross‐attention (HACA) and hierarchical self‐attention (HSA) as its core modules to achieve enhanced precision and completeness in 3D occupancy prediction. The former module enables 2D–3D transformation, while the latter promotes voxels’ intercommunication. The key insight behind both modules is our multi‐height attention mechanism which ensures each attention head corresponds explicitly to a specific height, thereby decoupling height information while maintaining global attention across the height dimension. Extensive experiments show that our method brings significant improvements compared to baseline and surpasses all concurrent methods, demonstrating its superiority.

Publisher

Wiley

Reference30 articles.

1. MonoScene: Monocular 3D Semantic Scene Completion

2. Learning point cloud context information based on 3D transformer for more accurate and efficient classification

3. Deformable Convolutional Networks

4. Dosovitskiy A. Beyer L. Kolesnikov A. Weissenborn D. Zhai X. Unterthiner T.et al. (2020)An image is worth 16×16 words: transformers for image recognition at scale.Arxiv[Preprint].https://doi.org/10.48550/arXiv.2010.11929

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3