Author:
Ning Tao,Gao Yuan,Han Yumeng
Abstract
AbstractTo address the challenges posed by diverse pattern-background elements, intricate details, and complex textures in the semantic segmentation of ethnic clothing patterns, this research introduces a novel semantic segmentation network model called MST-Unet (Mixed Swin Transformer U-net). The proposed model combines a U-shaped network structure with multiple attention mechanisms. The upper layers of the model employ classical convolutional operations, focusing on local relationships in the initial layers containing high-resolution details. In deeper layers, Swin Transformer modules are utilized, capable of efficient feature extraction with smaller spatial dimensions, maintaining performance while reducing computational burden. An attention gate mechanism is integrated into the decoder, contributing to enhanced performance in ethnic clothing pattern segmentation tasks by allowing the model to better capture crucial image features and achieve precise segmentation results. In visual comparisons of segmentation results, our proposed model demonstrates superior performance. The segmentation results exhibit more complete preservation of edge contours and fewer misclassifications in irrelevant regions within the images. In qualitative and quantitative experiments conducted on the ethnic clothing pattern dataset, our model achieves the highest Dice score for segmentation results in all four subclasses of ethnic clothing patterns. The average Dice score of our model reaches an impressive 89.80%, surpassing other algorithms in the same category. When compared to Deeplab_V3+, ResUnet, SwinUnet, and Unet networks, our model outperforms them by 7.72%, 5.09%, 5.05%, and 0.67%.
Publisher
Springer Science and Business Media LLC