JOURNAL ARTICLE

A Spatial–Frequency Combined Transformer for Cloud Removal of Optical Remote Sensing Images

Feng ZhaoChao DingXin LiRunliang XiaCaifeng WuXin Lyu

Year: 2025 Journal:   Remote Sensing Vol: 17 (9)Pages: 1499-1499   Publisher: Multidisciplinary Digital Publishing Institute

Abstract

Cloud removal is a vital preprocessing step in optical remote sensing images (RSIs), directly enhancing image quality and providing a high-quality data foundation for downstream tasks, such as water body extraction and land cover classification. Existing methods attempt to combine spatial and frequency features for cloud removal, but they rely on shallow feature concatenation or simplistic addition operations, which fail to establish effective cross-domain synergistic mechanisms. These approaches lead to edge blurring and noticeable color distortions. To address this issue, we propose a spatial–frequency collaborative enhancement Transformer network named SFCRFormer, which significantly improves cloud removal performance. The core of SFCRFormer is the spatial–frequency combined Transformer (SFCT) block, which implements cross-domain feature reinforcement through a dual-branch spatial attention (DBSA) module and frequency self-attention (FreSA) module to effectively capture global context information. The DBSA module enhances the representation of spatial features by decoupling spatial-channel dependencies via parallelized feature refinement paths, surpassing the performance of traditional single-branch attention mechanisms in maintaining the overall structure of the image. FreSA leverages fast Fourier transform to convert features into the frequency domain, using frequency differences between object and cloud regions to achieve precise cloud detection and fine-grained removal. In order to further enhance the features extracted by DBSA and FreSA, we design the dual-domain feed-forward network (DDFFN), which effectively improves the detail fidelity of the restored image by multi-scale convolution for local refinement and frequency transformation for global structural optimization. A composite loss function, incorporating Charbonnier loss and Structural Similarity Index (SSIM) loss, is employed to optimize model training and balance pixel-level accuracy with structural fidelity. Experimental evaluations on the public datasets demonstrate that SFCRFormer outperforms state-of-the-art methods across various quantitative metrics, including PSNR and SSIM, while delivering superior visual results.

Keywords:
Remote sensing Cloud computing Transformer Computer science Geology Electrical engineering Engineering

Metrics

1
Cited By
3.52
FWCI (Field Weighted Citation Impact)
67
Refs
0.83
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Advanced Image Fusion Techniques
Physical Sciences →  Engineering →  Media Technology
Remote Sensing in Agriculture
Physical Sciences →  Environmental Science →  Ecology
Remote-Sensing Image Classification
Physical Sciences →  Engineering →  Media Technology
© 2026 ScienceGate Book Chapters — All rights reserved.