Feng LiuMing HuangHongyu GeDan TaoRuipeng Gao
Estimating monocular depth and ego-motion via unsupervised learning has emerged as a promising approach in autonomous driving, mobile robots, and AR/VR applications. It avoids intensive efforts on collecting a large amount of the ground truth, and further improves the scene construction density and long-term tracking accuracy in SLAM systems. However, existing approaches are susceptible to illumination variations and blurry pictures due to fast movements in real-world driving scenarios. In this paper, we propose a novel unsupervised learning framework to fuse the complementary strength of visual and inertial measurements for monocular depth estimation. It learns both forward and backward inertial sequences at multiple subspaces to produce environment-independent and scale-consistent motion features, and selectively weights inertial and visual modalities to adapt to various scenes and motion states. In addition, we explore a novel virtual stereo model to adopt such depth estimates in the monocular SLAM system, thus improving the system efficiency and accuracy. Extensive experiments on KITTI, EuRoC, and TUM data sets have shown our effectiveness in terms of monocular depth estimation, SLAM initialization efficiency, and pose estimation accuracy compared with the state-of-the-art.
Ying HePengshuai YinF. Richard YuXinyu ZengZhiquan Liu
Xiangyu LiYonghong HouQi WuPichao WangWanqing Li
戴仁月 Dai Renyue方志军 Fang Zhijun高永彬 Gao Yongbin
Diana WofkRené RanftlMatthias MüllerVladlen Koltun
Junyu ZhuLina LiuYong LiuWanlong LiFeng WenHongbo Zhang