Abstract
AbstractFine-grained image classification of marine organisms involves dividing subcategories within a larger category. For instance, this could mean distinguishing specific species of fish or types of algae. This type of classification is more intricate than regular image classification, as the minor feature differences between subcategories are often concentrated in one or a few specific areas. Therefore, accurately identifying these critical regions and effectively using local features are crucial in improving the accuracy of fine-grained image classification. Existing methods for fine-grained image classification primarily rely on single-branch models based on convolutional neural networks (CNNs) or vision transformers (ViTs). Consequently, merging them allows for a more comprehensive understanding of marine organism images. In addition, marine organism images are affected by the distance and angle of the shot, making it challenging to capture detailed local nuances at a single scale. To address these challenges, we propose a multi-scale dual-branch network (MSDBN) that combines the strengths of ViT and CNN for fine-grained image classification of marine organisms. Our model uses a novel two-stage selection module to select discriminative regions from the ViT branch. Following this, the CNN branch executes a more detailed feature extraction on the local regions. To effectively utilise the multi-scale information of marine organisms, we introduce our designed multi-scale shift-window self-attention, specifically for the ViT branch. MSDBN demonstrates improved performance compared to existing classical methods and the best-performing dual-branch methods on three marine datasets. Our code is released publicly at https://github.com/Xiaosigz/MSDBN.
Funder
National Natural Science Foundation of China
TaiShan Scholars Youth Expert Program of Shandong Province
Publisher
Springer Science and Business Media LLC