JOURNAL ARTICLE

SeMask-Mask2Former: A Semantic Segmentation Model for High Resolution Remote Sensing Images

Abstract

With the development of remote sensing, semantic segmentation of high-resolution remote sensing images (RSIs) is increasingly essential. At the same time, the characteristics of objects in RSIs, such as large size, variation in object scales, and complex details, make it necessary to capture both long-range context and local information. There are some methods such as Fully Convolutional Networks (FCN) and Pyramid Scene Parsing Network (PSPNet) lack the ability to capture long-range dependencies, due to the limited receptive field of Convolutional Neural Network (CNN). However, the self-attention mechanism to capture the correlation between pixels in Transformer models has remarkable capability in capturing long-range context. One of the most outstanding Transformer models is the Masked-attention Mask Transformer (Mask2Former) which adopts the mask classification method. We propose a model SeMask-Mask2Former with boundary loss. Semantically Masked (Se-Mask) is the model's backbone and Mask2Former is the decoder. Concretely, the mask classification that generates one or even more masks for specific categories to perform the elaborate segmentation is especially suitable for handling the characteristic of large within-class and small inter-class variance of RSIs. Above all, extensive experimental results show that SeMask-Mask2Former obtains better results in semantic segmentation of high-resolution RSIs on the ISPRS Potsdam dataset compared to CNN-based methods and other state-of-the-art transformer-based methods. Extensive ablation studies conducted on the Potsdam dataset verifies the contribution of each component or optimization strategy in SeMask-Mask2Former.

Keywords:
Computer science Segmentation Convolutional neural network Artificial intelligence Transformer Pixel Remote sensing Parsing Image resolution High resolution Pattern recognition (psychology) Computer vision Geography

Metrics

7
Cited By
1.27
FWCI (Field Weighted Citation Impact)
16
Refs
0.76
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Advanced Neural Network Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Remote-Sensing Image Classification
Physical Sciences →  Engineering →  Media Technology
Advanced Image and Video Retrieval Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
© 2026 ScienceGate Book Chapters — All rights reserved.