Underwater Dam Crack Image Classification Algorithm Based on Improved VanillaNet-Reference-Cited by-同舟云学术

Underwater Dam Crack Image Classification Algorithm Based on Improved VanillaNet

Published:2024-07-04 Issue:7 Volume:16 Page:845
ISSN:2073-8994
Container-title:Symmetry
language:en
Short-container-title:Symmetry

Author:

Zhu Sisi¹,Li Xinyu¹,Wan Gang¹,Wang Hanren²,Shao Shen²,Shi Pengfei²

Affiliation:

1. Hubei Technology Innovation Center for Smart Hydropower, Wuhan 430019, China

2. College of Artificial Intelligence and Automation, Hohai University, Changzhou 213200, China

Abstract

In the task of classifying images of cracks in underwater dams, symmetry serves as a crucial geometric feature that aids in distinguishing cracks from other structural elements. Nevertheless, the asymmetry in the distribution of positive and negative samples within the underwater dam crack image dataset results in a long-tail problem. This asymmetry, coupled with the subtle nature of crack features, leads to inadequate feature extraction by existing convolutional neural networks, thereby reducing classification accuracy. To address these issues, this paper improves VanillaNet. First, the Seesaw Loss loss function is introduced to tackle the long-tail problem in classifying underwater dam crack images, enhancing the model’s ability to recognize tail categories. Second, the Adaptive Frequency Filtering Token Mixer (AFF Token Mixer) is implemented to improve the model’s capability to capture crack image features and enhance classification accuracy. Finally, label smoothing is applied to prevent overfitting to the training data and improve the model’s generalization performance. The experimental results demonstrate that the proposed improvements significantly enhance the model’s classification accuracy for underwater dam crack images. The optimized algorithm achieves superior average accuracy in classifying underwater dam crack images, showing improvements of 1.29% and 0.64% over the relatively more accurate models ConvNeXtV2 and RepVGG, respectively. Compared to VanillaNet, the proposed algorithm increases average accuracy by 2.66%. The improved model also achieves higher accuracy compared to the pre-improved model and other mainstream networks.

Funder

National Key R&D Program of China

Jiangsu Province Natural Science Foundation

Open Research Fund of Hubei Technology Innovation Center for Smart Hydropower

Changzhou Sci&Tech Program

Publisher

MDPI AG

Link

https://www.mdpi.com/2073-8994/16/7/845/pdf

Reference25 articles.

1. Krizhevsky, A., Sutskever, I., and Hinton, G.E. (2012). Imagenet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst., 25.

2. Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., and Gelly, S. (2020). An image is worth 16 × 16 words: Transformers for image recognition at scale. arXiv.

3. Zhai, X., Kolesnikov, A., Houlsby, N., and Beyer, L. (2022, January 18–24). Scaling vision transformers. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA.

4. Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., and Li, F.-F. (2009, January 20–25). Imagenet: A large-scale hierarchical image database. Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA.

5. Wang, H., Ma, S., Dong, L., Huang, S., Zhang, D., and Wei, F. (2024). Deepnet: Scaling transformers to 1,000 layers. IEEE Trans. Pattern Anal. Mach. Intell., 1–14.