Xiangqian LiXin TanZhizhong ZhangYuan XieLizhuang Ma
Current outdoor point-cloud segmentation methods typically formulate semantic segmentation as a per-point/voxel-classification task. Although this strategy is straightforward because it classifies each point directly, it ignores the overall relationship of the category. As an alternative paradigm, mask classification decouples category classification from region localization, allowing the model to better capture overall category relationships. In this paper, we propose a novel approach called the point mask transformer (PMFormer), which transforms the semantic segmentation of point clouds from per-point classification to mask classification using a transformer architecture. The proposed model comprises a 3D backbone, transformer decoder, and segmentation head that predicts a series of binary masks, each associated with a global class label. Furthermore, to accommodate the unique characteristics of large and sparse outdoor point-cloud scenes, we propose three enhancements for the integration of point-cloud data with the transformer: MaskMix, 3D position encoding, and attention weights. We evaluate our model using the SemanticKITTI and nuScenes datasets. Our experimental results show that the proposed method outperforms state-of-the-art semantic segmentation approaches.
Xiang HeXu LiPeizhou NiXu WangQimin XuXixiang Liu
Abhishek KuriyalVaibhav KumarBharat Lohani
Jinge SongZhenyuan CaoXueyan LiXiuying LiShuxu Guo
Gang XiaoShuzhi Sam GeQibing WangRen LiJiawei Lu