Yongtao YuTao JiangJunyong GaoHaiyan GuanDilong LiShangbing GaoE. TangWenhao WangPeng TangJonathan Li
Equipped with multiple channels of laser scanners, multispectral light detection and ranging (MS-LiDAR) devices possess more advanced prospects in earth observation tasks compared with their single-band counterparts. It also opens up a potential-competitive solution to conducting land cover mapping with MS-LiDAR devices. In this paper, we develop a cross-context capsule vision transformer (CapViT) to serve for land cover classification with MS-LiDAR data. Specifically, the CapViT is structurized with three streams of capsule transformer encoders, which are stacked by capsule transformer (CapFormer) blocks, to exploit long-range global feature interactions at different context scales. These cross-context feature semantics are finally effectively fused to supervise accurate land cover type inferences. In addition, the CapFormer block parallels dual-path multi-head self-attention modules functioning to interpret both spatial token correlations and channel feature interdependencies, which favor significantly to the semantic promotion of feature encodings. Consequently, with the semantic-promoted feature encodings to boost the feature representation distinctiveness and quality, the land cover classification accuracy is effectively improved. The CapViT is elaborately testified on two MS-LiDAR datasets. Both quantitative assessments and comparative analyses demonstrate the competitive capability and advanced performance of the CapViT in tackling land cover classification issues.
Dilong LiShenghong ZhengZiyi ChenJonathan LiLanying WangJi‐Xiang Du
Yongtao YuHaiyan GuanDilong LiTiannan GuLanfang WangLingfei MaJonathan Li
Yongtao YuChao LiuHaiyan GuanLanfang WangShangbing GaoHaiyan ZhangYahong ZhangJonathan Li
Nima EkhtariCraig GlennieJuan Carlos Fernández-Diaz
Salem MorsyAhmed ShakerAhmed El‐RabbanyP. E. LaRocque