Affiliation:
1. School of Computer Science and Technology Henan Polytechnic University Jiaozuo China
2. Institute of Quantitative Remote Sensing and Smart Agriculture Henan Polytechnic University Jiaozuo China
3. School of Computing and Mathematical Sciences University of Leicester Leicester UK
Abstract
AbstractX‐ray security checks aim to detect contraband in luggage; however, the detection accuracy is hindered by the overlapping and significant size differences of objects in X‐ray images. To address these challenges, the authors introduce a novel network model named Multi‐Scale Feature Attention (MSFA)‐DEtection TRansformer (DETR). Firstly, the pyramid feature extraction structure is embedded into the self‐attention module, referred to as the MSFA. Leveraging the MSFA module, MSFA‐DETR extracts multi‐scale feature information and amalgamates them into high‐level semantic features. Subsequently, these features are synergised through attention mechanisms to capture correlations between global information and multi‐scale features. MSFA significantly bolsters the model's robustness across different sizes, thereby enhancing detection accuracy. Simultaneously, A new initialisation method for object queries is proposed. The authors’ foreground sequence extraction (FSE) module extracts key feature sequences from feature maps, serving as prior knowledge for object queries. FSE expedites the convergence of the DETR model and elevates detection accuracy. Extensive experimentation validates that this proposed model surpasses state‐of‐the‐art methods on the CLCXray and PIDray datasets.
Funder
Science and Technology Department of Henan Province
National Natural Science Foundation of China
Publisher
Institution of Engineering and Technology (IET)