SeMask-Mask2Former: A Semantic Segmentation Model for High Resolution Remote Sensing Images

Yicheng Qiao; Wei Liu; Bin Liang; Pengyun Wang; Haopeng Zhang; Junli Yang

doi:10.1109/aero55745.2023.10115761

ScienceGate Book Chapters

JOURNAL ARTICLE

SeMask-Mask2Former: A Semantic Segmentation Model for High Resolution Remote Sensing Images

Yicheng Qiao Wei Liu Bin Liang Pengyun Wang Haopeng Zhang Junli Yang

Year: 2023 Pages: 1-6

DOI: 10.1109/aero55745.2023.10115761

Get Full-Text PDF Get Analytical Report

Abstract

With the development of remote sensing, semantic segmentation of high-resolution remote sensing images (RSIs) is increasingly essential. At the same time, the characteristics of objects in RSIs, such as large size, variation in object scales, and complex details, make it necessary to capture both long-range context and local information. There are some methods such as Fully Convolutional Networks (FCN) and Pyramid Scene Parsing Network (PSPNet) lack the ability to capture long-range dependencies, due to the limited receptive field of Convolutional Neural Network (CNN). However, the self-attention mechanism to capture the correlation between pixels in Transformer models has remarkable capability in capturing long-range context. One of the most outstanding Transformer models is the Masked-attention Mask Transformer (Mask2Former) which adopts the mask classification method. We propose a model SeMask-Mask2Former with boundary loss. Semantically Masked (Se-Mask) is the model's backbone and Mask2Former is the decoder. Concretely, the mask classification that generates one or even more masks for specific categories to perform the elaborate segmentation is especially suitable for handling the characteristic of large within-class and small inter-class variance of RSIs. Above all, extensive experimental results show that SeMask-Mask2Former obtains better results in semantic segmentation of high-resolution RSIs on the ISPRS Potsdam dataset compared to CNN-based methods and other state-of-the-art transformer-based methods. Extensive ablation studies conducted on the Potsdam dataset verifies the contribution of each component or optimization strategy in SeMask-Mask2Former.

Keywords:

Computer science Segmentation Convolutional neural network Artificial intelligence Transformer Pixel Remote sensing Parsing Image resolution High resolution Pattern recognition (psychology) Computer vision Geography

Metrics

Cited By

1.27

FWCI (Field Weighted Citation Impact)

Refs

0.76

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Advanced Neural Network Applications

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Remote-Sensing Image Classification

Physical Sciences → Engineering → Media Technology

Advanced Image and Video Retrieval Techniques

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

SeMask-Mask2Former: A Semantic Segmentation Model for High Resolution Remote Sensing Images

Abstract

Metrics

Citation History

Topics

Related Documents

ConvNeXt-Mask2Former: A Semantic Segmentation Model for Land Classification in Remote Sensing Images

Mask2Former with Improved Query for Semantic Segmentation in Remote-Sensing Images

Object-Enhanced Semantic Segmentation Model for High-Resolution Remote Sensing Images

Multi-Semantic Markov Random Field Model for Semantic Segmentation of High-Resolution Remote Sensing Images

Dynamic High-Resolution Network for Semantic Segmentation in Remote-Sensing Images