Affiliation:
1. School of Electrical Engineering, Xinjiang University, Urumqi, Xinjiang 830017, P. R. China
Abstract
In view of the scarcity, high cost and lack of diversity of three-dimensional (3D) face datasets, this paper designs an end-to-end self-supervised learning 3D face reconstruction network, which uses single 2D face image as input. The model bypasses the 3D face datasets and only uses the 2D face datasets for training to achieve high-precision 3D face reconstruction without any 3D face prior. First, the improved ResNet50 feature extraction module is introduced to extract and characterize the input image by deep convolutional network. Then, a lightweight convolutional block attention module is added to the face prediction subnetwork. On the one hand, channel attention extracts different information included in the image, and on the other hand spatial attention finds the location of the information. So, the serialized attention operation could accurately find the features required for different parameter predictions, further improving face reconstruction parameters’ prediction accuracy. Finally, training, ablation and comparison experiments were conducted on CelebFaces Attributes, basel face model and Photoface datasets, and the combined loss function of pixel loss and perception loss was selected. The pixel loss function was calculated at the pixel microscopic level, and the perception loss function was calculated at the image macroscopic convolution level. The combination of the two could complement each other. Compared with the historical optimal results of the same network structure, the scale-invariant depth error and mean angle deviation of the proposed algorithm are improved by 5.2% and 8.2%, respectively. Experimental results strongly prove the effectiveness of the algorithm.
Funder
National Natural Science Foundation of China
Publisher
World Scientific Pub Co Pte Ltd