Affiliation:
1. Pattern Recognition and Computer Vision Lab, Zhejiang Sci-Tech University, Hangzhou 310018, China
2. Key Laboratory of Digital Design and Intelligent Manufacture in Culture & Creativity Product of Zhejiang Province, Lishui University, Lishui 323000, China
Abstract
Modern neural networks addressing dense Non-Rigid Structure from Motion (NRSFM) dilemmas often grapple with intricate a priori constraints, deterring scalability, or overlook the imperative of consistent application of a priori knowledge throughout the entire input sequence. In this paper, an innovative neural network architecture is introduced. Initially, the complete 2D sequence image undergoes embedding into a low-dimensional space. Subsequently, multiple self-attention layers are employed to extract inter-frame features, with the objective of deriving a more continuous and temporally smooth low-dimensional structure closely resembling real data’s intrinsic structure. Moreover, it has been demonstrated by others that gradient descent during the training of multilayer linear networks yields minimum rank solutions, implicitly providing regularization that is equally applicable to this task. Benefiting from the excellence of the proposed network architecture, no additional a priori knowledge is mandated, barring the constraint of temporal smoothness. Extensive experimentation confirms the method’s exceptional performance in addressing dense NRSFM challenges, outperforming recent results across various dense benchmark datasets.
Funder
the Natural Science Foundation of Zhejiang Province
the National Natural Science Foundation of China
Subject
Electrical and Electronic Engineering,Computer Networks and Communications,Hardware and Architecture,Signal Processing,Control and Systems Engineering