Abstract
Abstract
Estimating depth from a solitary RGB image, known as monocular depth estimation, presents a significant challenge. Currently, most methods for this task involve designing increasingly complex networks to regress the depth map straightforwardly. However, we have adopted a more interpretable approach by using Conditional Random Fields from optimization methods. Additionally, to facilitate better information transfer between nodes, a multi-head attention mechanism is employed to calculate multiple energy functions, which are then optimized by the network into an accurate depth map. Experiments demonstrate that our method can accurately estimate the depth of landscapes.