Cross-layer fusion enhanced transformer network for remote sensing scene classification

Xiaoli Gao; Ming Zhang; Dahua Yu; Jianjun Li; Kehong Liu; Guoqing Li

doi:10.1088/2631-8695/adf8b2

ScienceGate Book Chapters

JOURNAL ARTICLE

Cross-layer fusion enhanced transformer network for remote sensing scene classification

Xiaoli Gao Ming Zhang Dahua Yu Jianjun Li Kehong Liu Guoqing Li

Year: 2025 Journal: Engineering Research Express Vol: 7 (3)Pages: 035262-035262 Publisher: IOP Publishing

DOI: 10.1088/2631-8695/adf8b2

Get Full-Text PDF Get Analytical Report

Abstract

Abstract The rapid development of deep learning, especially the application of CNN and Transformer, brings a new direction for remote sensing scene classification. CNN is challenging to capture multi-scale features due to the limitation of the receptive field. Although the feature pyramid network (FPN) is a multi-scale structure, its fusion mechanism inherently limits the extraction of fine-grained information. The Transformer architecture excels at capturing long-range dependencies through its self-attention mechanism. While the multihead self-attention mechanism allows it to concentrate on both global and local patterns, there is still room for improvement in explicitly enhancing the extraction of local spatial features and edge detail information. To overcome the above limitations, this paper proposes a novel end-to-end learning model, called cross-layer fusion enhanced transformer network (CLFETNet), for remote sensing scene classification. The innovative contribution of this paper is to embed three novel modules into the traditional CNN model. Firstly, we utilize the FPN to extract multi-scale features and design a fine feature aggregation module (RFA) to integrate complementary information from different layers to obtain more reliable and accurate features. Second, the efficient frequency transformer (EFT) is proposed to deeply mine global information, local spatial features, and edge detail information. Finally, the graph-aware fusion (GAF) module is proposed to model the relationships among regions in the feature graph by graph convolutional network (GCN), which enhances the intra-class feature consistency. A wide range of experiments have been carried out on three publicly accessible RS scene datasets. Positive results show that our CLFETNet framework outperforms many existing methods, has high robustness and superiority in remote sensing scene classification tasks, and can effectively handle complex scenes and meet multi-scale feature challenges.

Keywords:

Transformer Fusion Computer science Artificial intelligence Remote sensing Pattern recognition (psychology) Engineering Geology Electrical engineering Voltage

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

0.37

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Topics

Remote-Sensing Image Classification

Physical Sciences → Engineering → Media Technology

Advanced Image Fusion Techniques

Physical Sciences → Engineering → Media Technology

Remote Sensing and Land Use

Physical Sciences → Earth and Planetary Sciences → Atmospheric Science

Cross-layer fusion enhanced transformer network for remote sensing scene classification

Abstract

Metrics

Topics

Related Documents

Remote Sensing Scene Classification Using Spatial Transformer Fusion Network

A Hierarchical Graph-Enhanced Transformer Network for Remote Sensing Scene Classification

A Cross-Layer Nonlocal Network for Remote Sensing Scene Classification

Cross‐Transformer Fusion Network for Multimodal Remote Sensing Image Classification

Multiple Hierarchical Cross-Scale Transformer for Remote Sensing Scene Classification