MFFNet: a wavelet transform-based multimodal frequency fusion network for remote sensing semantic segmentation

Chao Li; Haitao Lyu; Weipeng Jing; Ye Yuan; Guangliang Cheng

doi:10.1080/15481603.2025.2534740

ScienceGate Book Chapters

JOURNAL ARTICLE

MFFNet: a wavelet transform-based multimodal frequency fusion network for remote sensing semantic segmentation

Chao Li Haitao Lyu Weipeng Jing Ye Yuan Guangliang Cheng

Year: 2025 Journal: GIScience & Remote Sensing Vol: 62 (1) Publisher: Taylor & Francis

DOI: 10.1080/15481603.2025.2534740

Get Full-Text PDF Get Analytical Report

Abstract

The use of multimodal data for semantic segmentation in remote sensing has attracted considerable interest, as it enables the integration of complementary information from various sensors. However, conventional multimodal fusion methods primarily operate in the spatial domain. Given the substantial divergence and inherent redundancy across modalities, direct fusion in the spatial domain often leads to the accumulation of irrelevant information and the loss of useful features. Furthermore, spatial-domain fusion alone is insufficient to fully exploit the complementary characteristics of multimodal data. To address these challenges, we introduce a wavelet transform-based multimodal frequency fusion network (MFFNet) to compensate for the limitations of spatial-domain fusion by introducing frequency-domain information. Specifically, we propose the spatial-frequency domain wavelet attention fusion module (SFWAF), which uses weight-shared spatial-domain branches to extract generic spatial features for different modalities. The SFWAF module uses the discrete wavelet transform (DWT) to map different modal features into the frequency domain for fusion and adaptively integrates the dual-domain features using a learnable weighting factor. Additionally, we propose a lightweight frequency-enhanced feature fusion (FEF) module for multiscale feature integration. This module fuses high-frequency components from various modalities using a fixed fusion strategy to preserve critical edge and detail information. Extensive experimental results on the ISPRS Vaihingen, ISPRS Potsdam, and WHU-OPT-SAR datasets demonstrate that MFFNet outperforms traditional multimodal fusion methods, achieving mIoU of 84.21% and 85.88% on the Vaihingen and Potsdam datasets, respectively, and overall accuracies of 92.26% and 91.16%.

Keywords:

Computer science Artificial intelligence Image fusion Fusion Wavelet Segmentation Frequency domain Pattern recognition (psychology) Sensor fusion Weighting Wavelet transform Feature (linguistics) Domain (mathematical analysis) Computer vision Data mining Mathematics Image (mathematics)

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

0.38

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Topics

Remote-Sensing Image Classification

Physical Sciences → Engineering → Media Technology

Geophysical Methods and Applications

Physical Sciences → Engineering → Ocean Engineering

Underwater Acoustics Research

Physical Sciences → Earth and Planetary Sciences → Oceanography

MFFNet: a wavelet transform-based multimodal frequency fusion network for remote sensing semantic segmentation

Abstract

Metrics

Topics

Related Documents

MFFNet: multimodal feature fusion network for point cloud semantic segmentation

Learning Frequency-Domain Fusion for Multimodal Remote Sensing Semantic Segmentation

MMFNet: A Mamba-Based Multimodal Fusion Network for Remote Sensing Image Semantic Segmentation

Vision Foundation Model Guided Multimodal Fusion Network for Remote Sensing Semantic Segmentation

FTransDeepLab: Multimodal Fusion Transformer-Based DeepLabv3+ for Remote Sensing Semantic Segmentation