Ziyin ZengHuan QiuJian ZhouZhen DongJinsheng XiaoBijun Li
Given the prominence of 3D sensors in recent years, 3D point clouds are worthy to be further investigated for environment perception and scene understanding. Learning accurate local and global contexts in point clouds is pivotal for semantic segmentation, and neighbor aggregation and Transformers have achieved notable success in local and global perception in point cloud analysis, respectively. Nevertheless, studying each independently is far from the optimal solution for comprehensive feature learning. To address this, we take a novel step towards investigating and integrating the structures of neighbor aggregation and Transformers. In this paper, we introduce Point Neighbor Aggregation with Transformer (PointNAT), a conceptually straightforward and effective approach aiming to enhance the performance of 3D point cloud semantic segmentation. PointNAT consists of a Neighbor Aggregation Block (NAB) for local perception, a Point Transformer Block (PTB) for global modeling, and a Hybrid Block to connect NABs and PTBs. NABs effectively learn complex local features at varying scales through an improved neighbor aggregation operation and a multi-head mechanism. PTBs efficiently perform global attention using a small set of learnable key points. Hybrid Blocks serve as high-and-low frequency signal hybridizers, merging the strengths of these two blocks by adaptively assigning hybrid weights to local and global contexts. We have evaluated the performance of PointNAT with state-of-the-art networks on several benchmarks, including S3DIS, Toronto3D, and SensatUrban. PointNAT achieves mIoU scores of 77.8%, 84.7%, and 65.2% in these three dataset, respectively. Furthermore, it outperforms the baseline approach PointNeXt by 3.0%, 1.3%, and 4.2%, respectively, while utilizing only 59.9% of the parameters and 15.2% of the FLOPs. The results demonstrate PointNAT's superior ability in accurately segmenting large-scale 3D point cloud scenes, emphasizing its potential to advance environment perception and scene understanding. Our code is available at https://github.com/zeng-ziyin/PointNAT.
Jinge SongZhenyuan CaoXueyan LiXiuying LiShuxu Guo
Yongyang XuWei TangZiyin ZengWeichao WuJie WanHan GuoZhong Xie
Xiang HeXu LiPeizhou NiXu WangQimin XuXixiang Liu
Fukun YinZilong HuangTao ChenGuozhong LuoGang YuBin Fu