Extracting building footprints from satellite or aerial imagery is critical for many applications. Yet, the precise delineation of buildings from very high spatial resolution remotely sensed images remains challenging. This study investigated the potentiality of using Mask R-CNN based on the Swin Transformer and Feature Pyramid Network (FPN) in extracting building footprints from RGB images in heterogeneous urban landscapes. The Swin Transformer and FPN were used to extract multiscale features. The model's performance was compared with several instance segmentation models based on the ResNet-50 backbone, including Mask scoring R-CNN, YOLCAT, and SOLO. Results showed that the model successfully segmented building footprints with a mAP50 and F-measure of 0.85 and 0.89, respectively, outperformed the evaluated instance segmentation models.
Kang ZhaoJungwon KangJaewook JungGunho Sohn
Ahmed NourEldeenM. El-Sayed Wahed
Kaibin ZhouYifan ChenIhor SmalRoderik Lindenbergh
Jialiang GaoBin ZhangYuntao WuChang Guo