JOURNAL ARTICLE

MCTNet: Multiscale Cross-Attention-Based Transformer Network for Semantic Segmentation of Large-Scale Point Cloud

Bo GuoLiwei DengRuisheng WangWenchao GuoAlex Hay‐Man NgWenfeng Bai

Year: 2023 Journal:   IEEE Transactions on Geoscience and Remote Sensing Vol: 61 Pages: 1-20   Publisher: Institute of Electrical and Electronics Engineers

Abstract

In this work, we implement a hybrid method to utilize sufficient information by aggregating both fine-grained and globally contextual features for point cloud semantic segmentation with a hierarchical network. By surpassing the defects of convolution operation mainly for extracting low-level features, we combine higher-level cross-attention based Transformer to investigate the importance of long-range relations together with position embedding for multiscale feature representation. Specifically, adding a learnable token to the feature sequence of a layer, a Transformer encoder is first implemented with limited scope to embed these features. Furthermore, instead of performing all-to-all attention, we merely fuse tokens spanning various scales. To improve efficiency, we propose a simple yet efficient token-fusing architecture based on cross-attention, in which the computation of attention maps can be restricted within linear time by only using a token to calculate the query. The cross-attention module can be efficiently aggregated in a multiscale network to further enlarge the scope of the receptive field for attention. Experiments show that our MCTNet achieves promising results on three largest point cloud datasets, DALES, DublinCity and S3DIS datasets. For the DALES benchmark dataset, MCTNet improves the mean intersection-over-union (mIoU) to 83.3% and the overall accuracy (OA) to 98.3%, which outperforms other existing baselines. We also perform abundant ablation studies on various attention and normalization modules and discuss the effect of parameters to validate the descriptive power of cross-attention module and provide an understanding of how long-range dependency can be used to learn fair and unbiased features.

Keywords:
Computer science Artificial intelligence Data mining Encoder Transformer Segmentation Feature extraction Security token Pattern recognition (psychology) Computer network

Metrics

17
Cited By
5.71
FWCI (Field Weighted Citation Impact)
68
Refs
0.95
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

3D Shape Modeling and Analysis
Physical Sciences →  Engineering →  Computational Mechanics
3D Surveying and Cultural Heritage
Physical Sciences →  Earth and Planetary Sciences →  Geology
Remote Sensing and LiDAR Applications
Physical Sciences →  Environmental Science →  Environmental Engineering

Related Documents

JOURNAL ARTICLE

MPT-Net: Mask Point Transformer Network for Large Scale Point Cloud Semantic Segmentation

Zhe Jun TangTat‐Jen Cham

Journal:   2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) Year: 2022 Pages: 10611-10618
JOURNAL ARTICLE

CFSA-Net: Efficient Large-Scale Point Cloud Semantic Segmentation Based on Cross-Fusion Self-Attention

Jun ShuShuai WangShiqi YuJie Zhang

Journal:   Computers, materials & continua/Computers, materials & continua (Print) Year: 2023 Vol: 77 (3)Pages: 2677-2697
© 2026 ScienceGate Book Chapters — All rights reserved.