PointBLIP: Zero-Training Point Cloud Classification Network Based on BLIP-2 Model-Reference-Cited by-同舟云学术

PointBLIP: Zero-Training Point Cloud Classification Network Based on BLIP-2 Model

Published:2024-07-03 Issue:13 Volume:16 Page:2453
ISSN:2072-4292
Container-title:Remote Sensing
language:en
Short-container-title:Remote Sensing

Author:

Xiao Yunzhe¹^ORCID,Dou Yong¹,Yang Shaowu¹

Affiliation:

1. College of Computer Science and Technology, National University of Defense Technology, Changsha 410073, China

Abstract

Leveraging the open-world understanding capacity of large-scale visual-language pre-trained models has become a hot spot in point cloud classification. Recent approaches rely on transferable visual-language pre-trained models, classifying point clouds by projecting them into 2D images and evaluating consistency with textual prompts. These methods benefit from the robust open-world understanding capabilities of visual-language pre-trained models and require no additional training. However, they face several challenges summarized as prompt ambiguity, image domain gap, view weight confusion, and feature deviation. In response to these challenges, we propose PointBLIP, a zero-training point cloud classification network based on the recently introduced BLIP-2 visual-language model. PointBLIP is adept at processing similarities between multi-images and multi-prompts. We separately introduce a novel method for point cloud zero-shot and few-shot classification, which involves comparing multiple features to achieve effective classification. Simultaneously, we enhance the input data quality for both the image and text sides of PointBLIP. In point cloud zero-shot classification tasks, we outperform state-of-the-art methods on three benchmark datasets. For few-shot classification tasks, to the best of our knowledge, we present the first zero-training few-shot point cloud method, surpassing previous works under the same conditions and showcasing comparable performance to full-training methods.

Funder

National Key R&D Program of China

Publisher

MDPI AG

Link

https://www.mdpi.com/2072-4292/16/13/2453/pdf

Reference53 articles.

1. Wu, W., Qi, Z., and Fuxin, L. (2019, January 15–20). Pointconv: Deep convolutional networks on 3d point clouds. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.

2. Thomas, H., Qi, C.R., Deschaud, J.E., Marcotegui, B., Goulette, F., and Guibas, L.J. (November, January 27). Kpconv: Flexible and deformable convolution for point clouds. Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea.

3. Zhao, H., Jiang, L., Jia, J., Torr, P.H., and Koltun, V. (2021, January 11–17). Point transformer. Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada.

4. CoFiNet: Reliable Coarse-to-fine Correspondences for Robust PointCloud Registration;Yu;Adv. Neural Inf. Process. Syst.,2021

5. Sheng, H., Cai, S., Liu, Y., Deng, B., Huang, J., Hua, X.S., and Zhao, M.J. (2021, January 11–17). Improving 3d object detection with channel-wise transformer. Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada.