JOURNAL ARTICLE

Cross-layer fusion enhanced transformer network for remote sensing scene classification

Xiaoli GaoMing ZhangDahua YuJianjun LiKehong LiuGuoqing Li

Year: 2025 Journal:   Engineering Research Express Vol: 7 (3)Pages: 035262-035262   Publisher: IOP Publishing

Abstract

Abstract The rapid development of deep learning, especially the application of CNN and Transformer, brings a new direction for remote sensing scene classification. CNN is challenging to capture multi-scale features due to the limitation of the receptive field. Although the feature pyramid network (FPN) is a multi-scale structure, its fusion mechanism inherently limits the extraction of fine-grained information. The Transformer architecture excels at capturing long-range dependencies through its self-attention mechanism. While the multihead self-attention mechanism allows it to concentrate on both global and local patterns, there is still room for improvement in explicitly enhancing the extraction of local spatial features and edge detail information. To overcome the above limitations, this paper proposes a novel end-to-end learning model, called cross-layer fusion enhanced transformer network (CLFETNet), for remote sensing scene classification. The innovative contribution of this paper is to embed three novel modules into the traditional CNN model. Firstly, we utilize the FPN to extract multi-scale features and design a fine feature aggregation module (RFA) to integrate complementary information from different layers to obtain more reliable and accurate features. Second, the efficient frequency transformer (EFT) is proposed to deeply mine global information, local spatial features, and edge detail information. Finally, the graph-aware fusion (GAF) module is proposed to model the relationships among regions in the feature graph by graph convolutional network (GCN), which enhances the intra-class feature consistency. A wide range of experiments have been carried out on three publicly accessible RS scene datasets. Positive results show that our CLFETNet framework outperforms many existing methods, has high robustness and superiority in remote sensing scene classification tasks, and can effectively handle complex scenes and meet multi-scale feature challenges.

Keywords:
Transformer Fusion Computer science Artificial intelligence Remote sensing Pattern recognition (psychology) Engineering Geology Electrical engineering Voltage

Metrics

0
Cited By
0.00
FWCI (Field Weighted Citation Impact)
68
Refs
0.37
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Topics

Remote-Sensing Image Classification
Physical Sciences →  Engineering →  Media Technology
Advanced Image Fusion Techniques
Physical Sciences →  Engineering →  Media Technology
Remote Sensing and Land Use
Physical Sciences →  Earth and Planetary Sciences →  Atmospheric Science

Related Documents

JOURNAL ARTICLE

A Hierarchical Graph-Enhanced Transformer Network for Remote Sensing Scene Classification

Ziwei LiWeiming XuShiyu YangJuan WangHua SuZhanchao HuangSheng Wu

Journal:   IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing Year: 2024 Vol: 17 Pages: 20315-20330
JOURNAL ARTICLE

A Cross-Layer Nonlocal Network for Remote Sensing Scene Classification

Ming LiLin LeiYuli SunXiao LiGangyao Kuang

Journal:   IEEE Geoscience and Remote Sensing Letters Year: 2021 Vol: 19 Pages: 1-5
JOURNAL ARTICLE

Cross‐Transformer Fusion Network for Multimodal Remote Sensing Image Classification

Huiqing WangZhongyu LiLinfeng Wu

Journal:   The Photogrammetric Record Year: 2025 Vol: 40 (191)
© 2026 ScienceGate Book Chapters — All rights reserved.