Delicate feature would become weaker or even disappear as going through deeper network layer, which results in jagged edges of objects in the depth estimation. To address this problem, we propose a multi-resolution attention based monocular depth estimation by capturing and recalibrating the dependencies of channels between different resolutions, including: we develop a multi-resolution attention encoder by embedding resolution and spatial attention, so as to extract the higher quality features in encoder, and we design a reusable multi-resolution fusion model in decoder, in which different scale features are integrated and then extract multi-resolution attention maps, so as to enhance detail information locally and perspective structure features globally. Experimental results illustrate the proposed network is capable to predict more precise depth and sharper contour, and to outperforms related methods on the KITTI benchmark.
Saddam AbdulwahabHatem A. RashwanMoumen El-MelegyDomènec Puig
Sangam Man BuddhacharyaRabin AdhikariNischal MaharjanSanjeeb Prasad Panday
Kyuhong ShimJi-Young KimGusang LeeByonghyo Shim