Shuyao HeYue ZhuYushan DongHao QinYuhong Mo
In this project, we present a novel approach to depth perception using a monocular camera by incorporating information from both RGB and LiDAR modalities. Our primary objective is to investigate the performance and effectiveness of different techniques to generate accurate depth estimation. We first implemented the Swin Transformer-based depth estimation model and evaluated its performance on KITTI dataset containing RGB images and their corresponding ground truth depth maps. Next, we proposed an RGB-LiDAR fusion model. We performed necessary preprocessing steps on the dataset, such as resizing, normalization, and data augmentation, and trained both models with identical configurations for a fair comparison. Our results demonstrate that the proposed RGB- LiDAR fusion model achieves superior depth estimation performance compared to the original Swin Transformer based model. We evaluated the models on the test dataset using metrics such as mean absolute error (MAE) and root mean squared error (RMSE). The enhanced performance indicates the potential benefits of RGB-LiDAR fusion for monocular depth perception tasks. This study offers valuable insights into the strengths [1] and weaknesses of combining RGB and LiDAR inputs and lays the foundation for future research in monocular depth perception, aiming to further improve model architectures and training techniques.
Shuyao HeYue ZhuYushan DongHao QinYuhong Mo
Shuwei ShaoZhongcai PeiWeihai ChenQiang LiuHaosong YueZhengguo Li
Hyunwoo RimDaewon KwakBeomjoon KimJ.S. KimDonghan Kim
Zhanghao SunDavid B. LindellOlav SolgaardGordon Wetzstein