Abstract
Abstract. Several methods exist that can be used to perform initial alignment of Building information models (BIMs) to the real building for Mixed Reality (MR) applications, such as marker-based or markerless visual methods, but this alignment is susceptible to drift over time. The existing model-based methods that can be used to maintain this alignment have multiple limitations, such as the use of iterative processes and poor performance in environments with either too many or not enough lines. To address these issues, we propose an end-to-end trainable Convolutional Neural Network (CNN) that takes a real and synthetic BIM image pair as input to regress the 6 DoF relative camera pose difference between them directly. By correcting the relative pose error we are able to considerably improve the alignment of the BIM to the real building. Furthermore, the results of our experiments demonstrate good performance in a challenging environment and high resilience to domain shift between synthetic and real images. A high localisation accuracy of approximately 7.0 cm and 0.9° is achieved which indicates the method can be used to reduce the camera tracking drift for MR applications.