In the last decade, supervised deep learning approaches have been extensively\nemployed in visual odometry (VO) applications, which is not feasible in\nenvironments where labelled data is not abundant. On the other hand,\nunsupervised deep learning approaches for localization and mapping in unknown\nenvironments from unlabelled data have received comparatively less attention in\nVO research. In this study, we propose a generative unsupervised learning\nframework that predicts 6-DoF pose camera motion and monocular depth map of the\nscene from unlabelled RGB image sequences, using deep convolutional Generative\nAdversarial Networks (GANs). We create a supervisory signal by warping view\nsequences and assigning the re-projection minimization to the objective loss\nfunction that is adopted in multi-view pose estimation and single-view depth\ngeneration network. Detailed quantitative and qualitative evaluations of the\nproposed framework on the KITTI and Cityscapes datasets show that the proposed\nmethod outperforms both existing traditional and unsupervised deep VO methods\nproviding better results for both pose estimation and depth recovery.\n
Mihai PuscasDan XuAndrea PilzerNicu Sebe
Xiangyu LiYonghong HouQi WuPichao WangWanqing Li
Filippo AleottiFabio TosiMatteo PoggiStefano Mattoccia