A Lightweight Face Detector via Bi-Stream Convolutional Neural Network and Vision Transformer-Reference-Cited by-同舟云学术

A Lightweight Face Detector via Bi-Stream Convolutional Neural Network and Vision Transformer

Published:2024-05-20 Issue:5 Volume:15 Page:290
ISSN:2078-2489
Container-title:Information
language:en
Short-container-title:Information

Author:

Zhang Zekun¹,Chao Qingqing¹,Wang Shijie¹,Yu Teng¹

Affiliation:

1. College of Electronic Information, Qingdao University, Qingdao 260000, China

Abstract

Lightweight convolutional neural networks are widely used for face detection due to their ability to learn local representations through spatial induction bias and translational invariance. However, convolutional face detectors have limitations in detecting faces under challenging conditions like occlusion, blurring, or changes in facial poses, primarily attributed to fixed-size receptive fields and a lack of global modeling. Transformer-based models have advantages on learning global representations but are insensitive to capture local patterns. To address these limitations, we propose an efficient face detector that combines convolutional neural network and transformer architectures. We introduce a bi-stream structure that integrates convolutional neural network and transformer blocks within the backbone network, enabling the preservation of local pattern features and the extraction of global context. To further preserve the local details captured by convolutional neural networks, we propose a feature enhancement convolution block in a hierarchical backbone structure. Additionally, we devise a multiscale feature aggregation module to enhance obscured and blurred facial features. Experimental results demonstrate that our method has achieved improved lightweight face detection accuracy with an average precision of 95.30%, 94.20%, and 87.56% across the easy, medium, and hard subdatasets of WIDER FACE, respectively. Therefore, we believe our method will be a useful supplement to the collection of current artificial intelligence models and benefit the engineering applications of face detection.

Publisher

MDPI AG

Link

https://www.mdpi.com/2078-2489/15/5/290/pdf

Reference45 articles.

1. Zhang, S., Zhu, R., Wang, X., Shi, H., Fu, T., Wang, S., Mei, T., and Li, S. (2019). Improved selective refinement network for face detection. arXiv.

2. Kuzdeuov, A., Koishigarina, D., and Varol, H.A. (2023, January 13–16). Anyface: A data-centric approach for input-agnostic face detection. Proceedings of the 2023 IEEE International Conference on Big Data and Smart Computing(BigComp), Jeju, Republic of Korea.

3. Howard, A.G., Sandler, M., Chu, G., Chen, L.-C., Chen, B., Tan, M., Wang, W., Zhu, Y., Pang, R., and Vasudevan, V. (November, January 27). Searching for mobilenetv3. Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea.

4. Wang, H., Li, Z., Ji, X., and Wang, Y. (2017). Face r-cnn. arXiv.

5. He, K., Gkioxari, G., Dollár, P., and Girshick, R.B. (2017). Mask r-cnn. arXiv.