Underwater Dam Crack Image Classification Algorithm Based on Improved VanillaNet
Author:
Zhu Sisi1, Li Xinyu1, Wan Gang1, Wang Hanren2, Shao Shen2, Shi Pengfei2
Affiliation:
1. Hubei Technology Innovation Center for Smart Hydropower, Wuhan 430019, China 2. College of Artificial Intelligence and Automation, Hohai University, Changzhou 213200, China
Abstract
In the task of classifying images of cracks in underwater dams, symmetry serves as a crucial geometric feature that aids in distinguishing cracks from other structural elements. Nevertheless, the asymmetry in the distribution of positive and negative samples within the underwater dam crack image dataset results in a long-tail problem. This asymmetry, coupled with the subtle nature of crack features, leads to inadequate feature extraction by existing convolutional neural networks, thereby reducing classification accuracy. To address these issues, this paper improves VanillaNet. First, the Seesaw Loss loss function is introduced to tackle the long-tail problem in classifying underwater dam crack images, enhancing the model’s ability to recognize tail categories. Second, the Adaptive Frequency Filtering Token Mixer (AFF Token Mixer) is implemented to improve the model’s capability to capture crack image features and enhance classification accuracy. Finally, label smoothing is applied to prevent overfitting to the training data and improve the model’s generalization performance. The experimental results demonstrate that the proposed improvements significantly enhance the model’s classification accuracy for underwater dam crack images. The optimized algorithm achieves superior average accuracy in classifying underwater dam crack images, showing improvements of 1.29% and 0.64% over the relatively more accurate models ConvNeXtV2 and RepVGG, respectively. Compared to VanillaNet, the proposed algorithm increases average accuracy by 2.66%. The improved model also achieves higher accuracy compared to the pre-improved model and other mainstream networks.
Funder
National Key R&D Program of China Jiangsu Province Natural Science Foundation Open Research Fund of Hubei Technology Innovation Center for Smart Hydropower Changzhou Sci&Tech Program
Reference25 articles.
1. Krizhevsky, A., Sutskever, I., and Hinton, G.E. (2012). Imagenet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst., 25. 2. Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., and Gelly, S. (2020). An image is worth 16 × 16 words: Transformers for image recognition at scale. arXiv. 3. Zhai, X., Kolesnikov, A., Houlsby, N., and Beyer, L. (2022, January 18–24). Scaling vision transformers. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA. 4. Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., and Li, F.-F. (2009, January 20–25). Imagenet: A large-scale hierarchical image database. Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA. 5. Wang, H., Ma, S., Dong, L., Huang, S., Zhang, D., and Wei, F. (2024). Deepnet: Scaling transformers to 1,000 layers. IEEE Trans. Pattern Anal. Mach. Intell., 1–14.
|
|