Affiliation:
1. School of Mechanical Science and Engineering Huazhong University of Science and Technology Wuhan China
Abstract
AbstractVisual defect recognition techniques based on deep learning models are crucial for modern industrial quality inspection. The backbone, serving as the primary feature extraction component of the defect recognition model, has not been thoroughly exploited. High‐performance vision transformer (ViT) is less adopted due to high computational complexity and limitations of computational resources and storage hardware in industrial scenarios. This paper presents LSA‐Former, a lightweight transformer architectural backbone that integrates the benefits of convolution and ViT. LSA‐Former proposes a novel self‐attention with linear computational complexity, enabling it to capture local and global semantic features with fewer parameters. LSA‐Former is pre‐trained on ImageNet‐1K and surpasses state‐of‐the‐art methods. LSA‐Former is employed as the backbone for various detectors, evaluated specifically on the PCB defect detection task. The proposed method reduces at least 18M parameters and exceeds the baseline by more than 2.2 mAP.
Funder
National Natural Science Foundation of China
Publisher
Institution of Engineering and Technology (IET)
Reference34 articles.
1. A Review on Recent Advances in Vision-based Defect Recognition towards Industrial Intelligence
2. Simonyan K. Zisserman A.:Very deep convolutional networks for large‐scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
3. He K. et al.:Deep residual learning for image recognition. In:Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.IEEE Piscataway(2016)
4. Kolesnikov A. et al.:An image is worth 16x16 words: Transformers for image recognition at scale. In:International Conference on Learning Representations.ICML San Diego(2021)