A Lightweight Face Detector via Bi-Stream Convolutional Neural Network and Vision Transformer
-
Published:2024-05-20
Issue:5
Volume:15
Page:290
-
ISSN:2078-2489
-
Container-title:Information
-
language:en
-
Short-container-title:Information
Author:
Zhang Zekun1, Chao Qingqing1, Wang Shijie1, Yu Teng1
Affiliation:
1. College of Electronic Information, Qingdao University, Qingdao 260000, China
Abstract
Lightweight convolutional neural networks are widely used for face detection due to their ability to learn local representations through spatial induction bias and translational invariance. However, convolutional face detectors have limitations in detecting faces under challenging conditions like occlusion, blurring, or changes in facial poses, primarily attributed to fixed-size receptive fields and a lack of global modeling. Transformer-based models have advantages on learning global representations but are insensitive to capture local patterns. To address these limitations, we propose an efficient face detector that combines convolutional neural network and transformer architectures. We introduce a bi-stream structure that integrates convolutional neural network and transformer blocks within the backbone network, enabling the preservation of local pattern features and the extraction of global context. To further preserve the local details captured by convolutional neural networks, we propose a feature enhancement convolution block in a hierarchical backbone structure. Additionally, we devise a multiscale feature aggregation module to enhance obscured and blurred facial features. Experimental results demonstrate that our method has achieved improved lightweight face detection accuracy with an average precision of 95.30%, 94.20%, and 87.56% across the easy, medium, and hard subdatasets of WIDER FACE, respectively. Therefore, we believe our method will be a useful supplement to the collection of current artificial intelligence models and benefit the engineering applications of face detection.
Reference45 articles.
1. Zhang, S., Zhu, R., Wang, X., Shi, H., Fu, T., Wang, S., Mei, T., and Li, S. (2019). Improved selective refinement network for face detection. arXiv. 2. Kuzdeuov, A., Koishigarina, D., and Varol, H.A. (2023, January 13–16). Anyface: A data-centric approach for input-agnostic face detection. Proceedings of the 2023 IEEE International Conference on Big Data and Smart Computing(BigComp), Jeju, Republic of Korea. 3. Howard, A.G., Sandler, M., Chu, G., Chen, L.-C., Chen, B., Tan, M., Wang, W., Zhu, Y., Pang, R., and Vasudevan, V. (November, January 27). Searching for mobilenetv3. Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea. 4. Wang, H., Li, Z., Ji, X., and Wang, Y. (2017). Face r-cnn. arXiv. 5. He, K., Gkioxari, G., Dollár, P., and Girshick, R.B. (2017). Mask r-cnn. arXiv.
|
|