Mamba-STFM: A Mamba-Based Spatiotemporal Fusion Method for Remote Sensing Images

Qiyuan Zhang; Xiaodan Zhang; Chen Quan; Tong Zhao; Wei Huo; Yuanchen Huang

doi:10.3390/rs17132135

ScienceGate Book Chapters

JOURNAL ARTICLE

Mamba-STFM: A Mamba-Based Spatiotemporal Fusion Method for Remote Sensing Images

Qiyuan Zhang Xiaodan Zhang Chen Quan Tong Zhao Wei Huo Yuanchen Huang

Year: 2025 Journal: Remote Sensing Vol: 17 (13)Pages: 2135-2135 Publisher: Multidisciplinary Digital Publishing Institute

DOI: 10.3390/rs17132135

Get Full-Text PDF Get Analytical Report

Abstract

Spatiotemporal fusion techniques can generate remote sensing imagery with high spatial and temporal resolutions, thereby facilitating Earth observation. However, traditional methods are constrained by linear assumptions; generative adversarial networks suffer from mode collapse; convolutional neural networks struggle to capture global context; and Transformers are hard to scale due to quadratic computational complexity and high memory consumption. To address these challenges, this study introduces an end-to-end remote sensing image spatiotemporal fusion approach based on the Mamba architecture (Mamba-spatiotemporal fusion model, Mamba-STFM), marking the first application of Mamba in this domain and presenting a novel paradigm for spatiotemporal fusion model design. Mamba-STFM consists of a feature extraction encoder and a feature fusion decoder. At the core of the encoder is the visual state space-FuseCore-AttNet block (VSS-FCAN block), which deeply integrates linear complexity cross-scan global perception with a channel attention mechanism, significantly reducing quadratic-level computation and memory overhead while improving inference throughput through parallel scanning and kernel fusion techniques. The decoder’s core is the spatiotemporal mixture-of-experts fusion module (STF-MoE block), composed of our novel spatial expert and temporal expert modules. The spatial expert adaptively adjusts channel weights to optimize spatial feature representation, enabling precise alignment and fusion of multi-resolution images, while the temporal expert incorporates a temporal squeeze-and-excitation mechanism and selective state space model (SSM) techniques to efficiently capture short-range temporal dependencies, maintain linear sequence modeling complexity, and further enhance overall spatiotemporal fusion throughput. Extensive experiments on public datasets demonstrate that Mamba-STFM outperforms existing methods in fusion quality; ablation studies validate the effectiveness of each core module; and efficiency analyses and application comparisons further confirm the model’s superior performance.

Keywords:

Remote sensing Environmental science Geography

Metrics

Cited By

7.03

FWCI (Field Weighted Citation Impact)

Refs

0.92

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Advanced Image Fusion Techniques

Physical Sciences → Engineering → Media Technology

Remote-Sensing Image Classification

Physical Sciences → Engineering → Media Technology

Image and Signal Denoising Methods

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Mamba-STFM: A Mamba-Based Spatiotemporal Fusion Method for Remote Sensing Images

Abstract

Metrics

Citation History

Topics

Related Documents

EST-STFM: An Efficient Deep-Learning-Based Spatiotemporal Fusion Method for Remote Sensing Images

A Mamba-based bridge detection method for optical remote sensing images

Algae-Mamba: A Spatially Variable Mamba for Algae Extraction From Remote Sensing Images

FA-Mamba: a Mamba-based approach for remote sensing change detection

HLMamba: Hybrid Lightweight Mamba-Based Fusion Network for Dense Prediction of Remote Sensing Images