Enhanced feature extraction-based semantic segmentation network for remote sensing image using modified Swin Transformer

Peng Song; Gui Liang Tang; Xue Wang; Hongda Mou; Ling Xiu Zhu; Xuan Lai

doi:10.1145/3617695.3617717

ScienceGate Book Chapters

JOURNAL ARTICLE

Enhanced feature extraction-based semantic segmentation network for remote sensing image using modified Swin Transformer

Peng Song Gui Liang Tang Xue Wang Hongda Mou Ling Xiu Zhu Xuan Lai

Year: 2023 Pages: 47-56

DOI: 10.1145/3617695.3617717

Get Full-Text PDF Get Analytical Report

Abstract

Remote sensing image segmentation is a specialized form of semantic segmentation that presents unique challenges not typically found in general semantic segmentation tasks. The key issues addressed in this study are the highly imbalanced foreground-background distribution and the presence of multiple small objects intertwined in complex backgrounds. However, existing methods heavily rely on convolutional neural networks (CNNs), which, due to their local nature, struggle to effectively capture global context. by the powerful global modeling capability of the Swin Transformer [1], this paper proposes a novel U-shaped network for remote sensing image semantic segmentation called Light Swin Transformer_Unet. In this network, the attention calculation of the Swin Transformer is modified and employed in the encoding part of the network. Additionally, an adaptive multi-level feature pyramid pooling based on CNNs is integrated into the auxiliary decoder of the Unet, creating a novel parallel connection structure with feature processing capabilities. This module effectively addresses the limitations of Transformers in focusing on local features. Experimental results on the Loveda [2] dataset demonstrate that the proposed network outperforms pure CNNs, pure Transformer networks, as well as networks that fuse CNNs and Transformers in other forms. Moreover, the proposed network achieves a slight performance improvement with a decrease in parameter count compared to the Transformer alone.The research findings provide a reference for the fusion network of CNN and Transformer, and offer valuable methods and techniques to address challenges in this field.

Keywords:

Computer science Segmentation Convolutional neural network Transformer Artificial intelligence Pooling Feature extraction Image segmentation Pattern recognition (psychology) Computer vision Data mining Voltage Engineering

Metrics

Cited By

0.18

FWCI (Field Weighted Citation Impact)

Refs

0.44

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Advanced Neural Network Applications

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Remote-Sensing Image Classification

Physical Sciences → Engineering → Media Technology

Advanced Image and Video Retrieval Techniques

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Enhanced feature extraction-based semantic segmentation network for remote sensing image using modified Swin Transformer

Abstract

Metrics

Citation History

Topics

Related Documents

FEST: Feature Enhancement Swin Transformer for Remote Sensing Image Semantic Segmentation

Swin Transformer Embedding UNet for Remote Sensing Image Semantic Segmentation

Enhanced Swin Transformer and Edge Spatial Attention for Remote Sensing Image Semantic Segmentation

ER-Swin: Feature Enhancement and Refinement Network Based on Swin Transformer for Semantic Segmentation of Remote Sensing Images

Combining Swin Transformer With UNet for Remote Sensing Image Semantic Segmentation