CMAF: Cross-Modal Augmentation via Fusion for Underwater Acoustic Image Recognition-Reference-Cited by-同舟云学术

CMAF: Cross-Modal Augmentation via Fusion for Underwater Acoustic Image Recognition

Published:2024-01-11 Issue:5 Volume:20 Page:1-25
ISSN:1551-6857
Container-title:ACM Transactions on Multimedia Computing, Communications, and Applications
language:en
Short-container-title:ACM Trans. Multimedia Comput. Commun. Appl.

Author:

Yang Shih-Wei¹^ORCID,Shen Li-Hsiang²^ORCID,Shuai Hong-Han¹^ORCID,Feng Kai-Ten¹^ORCID

Affiliation:

1. National Yang Ming Chiao Tung University, Taiwan

2. National Central University, Taiwan

Abstract

Underwater image recognition is crucial for underwater detection applications. Fish classification has been one of the emerging research areas in recent years. Existing image classification models usually classify data collected from terrestrial environments. However, existing image classification models trained with terrestrial data are unsuitable for underwater images, as identifying underwater data is challenging due to their incomplete and noisy features. To address this, we propose a cross-modal augmentation via fusion ( CMAF ) framework for acoustic-based fish image classification. Our approach involves separating the process into two branches: visual modality and sonar signal modality, where the latter provides a complementary character feature. We augment the visual modality, design an attention-based fusion module, and adopt a masking-based training strategy with a mask-based focal loss to improve the learning of local features and address the class imbalance problem. Our proposed method outperforms the state-of-the-art methods. Our source code is available at https://github.com/WilkinsYang/CMAF .

Publisher

Association for Computing Machinery (ACM)

Subject

Computer Networks and Communications,Hardware and Architecture

Link

https://dl.acm.org/doi/pdf/10.1145/3636427

Reference81 articles.

1. D. Akkaynak, T. Treibitz, T. Shlesinger, et al. 2017. What is the space of attenuation coefficients in underwater computer vision?. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’17). Honolulu, HI, USA, 568–577.

2. WaSR-A water segmentation and refinement maritime obstacle detection network;Bovcon B.;IEEE Transactions on Cybernetics,2022

3. A systematic study of the class imbalance problem in convolutional neural networks;Buda M.;Neural Networks,2018

4. Probabilistic end-to-end vehicle navigation in complex dynamic environments with multimodal sensor fusion;Cai P.;IEEE Robotics and Automation Letters,2020

5. SMOTE: Synthetic Minority Over-sampling Technique