Xin JingKenan DuJiale FengMao Shan
This paper proposes an improved high-precision 3D semantic mapping method for indoor scenes using RGB-D images.The current semantic mapping algorithms suffer from low semantic annotation accuracy and insufficient real-time performance.To address these issues, we first adopt the Elastic Fusion algorithm to select key frames from indoor environment image sequences captured by the Kinect sensor and construct the indoor environment space model.Then, an indoor RGB-D image semantic segmentation network is proposed, which uses multi-scale feature fusion to quickly and accurately obtain object labeling information at the pixel level of the spatial point cloud model.Finally, Bayesian updating is used to conduct incremental semantic label fusion on the established spatial point cloud model.We also employ dense conditional random fields (CRF) to optimize the 3D semantic map model, resulting in a high-precision spatial semantic map of indoor scenes.Experimental results show that the proposed semantic mapping system can process image sequences collected by RGB-D sensors in real-time and output accurate semantic segmentation results of indoor scene images and the current local spatial semantic map.Finally, it constructs a globally consistent high-precision indoor scenes 3D semantic map.
Alexander HermansGeorge FlorosBastian Leibe
Jianjun NiZiru ZhangKang ShenGuangyi TangSimon X. Yang
Natalia NeverovaDamien MuseletAlain Trémeau