Monocular depth estimation is a fundamental technique for robots to perceive the real (unseen) scene. Supervised methods rely on large-scale datasets with groundtruth (GT) depth labels, which cannot be well generalized to other scenes. A dominant solution is to directly train the model on target scenes in self-supervised way with pseudo depth labels (e.g. generated by stereo matching). However, pseudo depth labels are often unreliable especially near object boundaries. It may disturb the training of the model and consequently decrease the depth quality in the inference. In this paper, we investigate the structure similarity of RGB-Depth based on Gaussian kernels, because the structure of RGB image is always reliable. Such RGB-Depth structure similarity measurement is then used to improve the self-supervised depth estimation in two aspects. It is first utilized to measure the confidence of pseudo depth labels and filter unreliable pixels. It is then utilized to limit the structure of predicted depth maps in the loss. Experiments on the KITTI Eigen Splits datasets verify that the proposed method achieves better or comparable quantitative results and always achieves better visual results with clear depth boundaries compared with five recent baselines.
Chenggong HanDeqiang ChengQiqi KouXiaoyi WangLiangliang ChenJiamin Zhao
Ue-Hwan KimGyeong-Min LeeJong-Hwan Kim
Julio César Díaz MendozaHélio Pedrini
Mu HeLe HuiYikai BianJian RenJin XieJian Yang
Xiaoyang LyuLiang LiuMengmeng WangXin KongLina LiuYong LiuXinxin ChenYi Yuan