Daning TanYu LiuQiaowen JiangShun SunZiran DingGuo Chen
Multimodal aerial image has gradually become an important data support for resource exploration and urban planning. When multiple sensor images are used to extract building footprints, it is often difficult to obtain multiple modes at the same time and the problem of information redundancy leads to bad performance. In this paper, based on the full convolution segmentation network, a building footprint extraction model based on Transformer is proposed. In this model, the self-attention mechanism in Transformer is used to make full use of multi-modal information for joint training, and the fusion and segmentation are performed by concatenate operation. In addition, the robustness and stability of the model application are improved by the three-to-two mechanism. Experiments on the SpaceNet 6 dataset show that the proposed approach is 6.5-11.9% better than the U-Net-based approach.
Weijia LiConghui HeJiarui FangJuepeng ZhengHaohuan FuLe Yu
Ziyi ChenYuhua LuoJiaying ZhangRuoyu GuoLiai DengJinghua LiuDilong LiYongtao YuAmmar AbulibdehCheng Wang
LIAO Yuanhui, WANG Jingdong, LI Haoran, YANG Heng
Mohamed Barakat A. GibrilRami Al‐RuzouqAbdallah ShanablehRatiranjan JenaJan BolcekHelmi Zulhaidi Mohd ShafriOmid Ghorbanzadeh