Xiaoxia XingYinghao CaiTao LuYiping YangDayong Wen
Classical monocular Simultaneous Localization and Mapping (SLAM) and convolutional neural networks (CNNs) based monocular depth estimation represent two different methods towards reconstructing the 3D geometry of the scene. In this paper, we leverage SLAM and depth estimation for their respective advantages to further improve the performance of both tasks. For SLAM, running pseudo RGBD-SLAM with CNN-predicted depths improves the accuracy of visual odometry and mapping compared with the monocular SLAM baseline. For depth estimation, we use 3D scene structures from geometric SLAM to refine the pre-trained monocular depth estimation network to update the model which did not reach the optimum due to the photometric inconsistency. Moreover, the proposed method incorporates an optional Sparse Auxiliary Network [1] into the original depth estimation network, from which the sparse depth features are dynamically combined with RGB features for predicting the depth map. Experimental results on KITTI and TUM RGB-D datasets show that our method achieves state-of-the-art performances on both depth prediction and pose estimation tasks.
Chao FanZhenyu YinFulong XuAnying ChaiFeiqing Zhang
Ue-Hwan KimGyeong-Min LeeJong-Hwan Kim
Julio César Díaz MendozaHélio Pedrini
Haifeng HuYuyang FengDapeng LiSuofei ZhangHaitao Zhao
Xingyu ChenThomas H. LiRuonan ZhangGe Li