Shuo GuJiacheng LuJian YangChengzhong XuHui Kong
Dense semantic scene understanding of the surrounding environment in top-view is a crucial task for autonomous vehicles. Recent LiDAR-based semantic perception works mainly focus on point-wise predictions of the LiDAR points instead of dense predictions of the environment, making them not appropriate for path-planning tasks. Pillar and voxel representations can achieve dense predictions, but the generation of data representation and data processing are usually time-consuming. In this article, we propose a top-view semantic completion network to produce accurate dense grid-wise predictions with real-time performance. Specifically, we propose an online distillation strategy, consisting of two parts: a student model using 2D range-view and top-view representations, and a teacher model using range-view, top-view, and voxel representations. To realize information transfer between different representations, we propose a cross-view association (CVA) module, by which the range-view features and 3D voxel features are converted into the ones in the top-view. The proposed method can avoid the difficulty of direct dense semantic segmentation in the top-view, with the point-wise sparse semantic segmentation module acting as a guide for the dense grid-wise semantic completion in a semantic-completion way. It can also alleviate the computational complexity by using only the voxel representation and 3D convolution in the teacher model. The experimental results on the SemanticKITTI dataset (46.4% mIoU) and nuScenes-LidarSeg dataset (47.3% mIoU) demonstrate the effectiveness of the proposed sparse guidance and online distillation strategies.
Jianbiao MeiYu YangMengmeng WangJunyu ZhuJ.B. RaYukai MaLaijian LiYong Liu
Maximilian JaritzRaoul de CharetteÉmilie WirbelXavier PerrottonFawzi Nashashibi
Maximilian JaritzRaoul de CharetteÉmilie WirbelXavier PerrottonFawzi Nashashibi
Meng WangHuilong PiRuihui LiYunchuan QinZhuo TangKenli Li
Jiacheng LuShuo GuChengzhong XuHui Kong