Infrared–Visible Image Fusion via Cross-Modal Guided Dual-Branch Networks

T. J. Zhu; Jinyong Chen; Gang Wang

doi:10.3390/app152212185

ScienceGate Book Chapters

JOURNAL ARTICLE

Infrared–Visible Image Fusion via Cross-Modal Guided Dual-Branch Networks

T. J. Zhu Jinyong Chen Gang Wang

Year: 2025 Journal: Applied Sciences Vol: 15 (22)Pages: 12185-12185 Publisher: Multidisciplinary Digital Publishing Institute

DOI: 10.3390/app152212185

Get Full-Text PDF Get Analytical Report

Abstract

In the field of low-altitude aerial drone data fusion, the fusion of infrared and visible light images remains challenging due to issues such as large modal differences, insufficient cross-modal alignment, and limited global context modeling. Traditional methods struggle to extract complementary information across modalities, while deep learning methods often lack sufficient global receptive fields (convolutional neural networks) or fail to preserve local details (standard Transformers). To address these issues, we propose a Cross-modal Guided Dual-Branch Network (CGDBN) that combines convolutional neural networks and Transformer architecture. Our framework contribution: We designed a Target-modal Feature Extraction Mechanism (TMFEM) module with specialized thermal characteristics for infrared feature extraction, which does not require processing of visible light features; we introduced Simplified Linear Attention Blocks (SLABs) into our framework to improve global context capture as a module; we designed a Cross-Modal Interaction Mechanism (CMIM) module for bidirectional feature interaction; and we designed a Density Adaptive Multimodal Fusion (DAMF) module that weights modal contributions based on content analysis. This asymmetric design recognizes that different types of images have different characteristics and require targeted processing. The experimental results on AVMS, M3FD, and TNO datasets show that the proposed model has a peak signal-to-noise ratio (PSNR) of 16.2497 on the AVMS dataset, which is 0.9971 higher than the best benchmark method YDTR (peak signal-to-noise ratio: approximately 15.2526). The peak signal-to-noise ratio on the M3FD dataset is 16.5044, which is 0.7480 higher than the best benchmark method YDTR (peak signal-to-noise ratio of approximately 15.7564). The peak signal-to-noise ratio on the TNO dataset is 17.3956, which is 0.7934 higher than the best benchmark method YDTR (peak signal-to-noise ratio: approximately 16.6022), and the overall performance on all other indicators is among the top in all comparison models. This method has broad application prospects in fields such as drone data fusion.

Keywords:

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Infrared–Visible Image Fusion via Cross-Modal Guided Dual-Branch Networks

Abstract

Metrics

Topics

Related Documents

Dual-branch visible and infrared image fusion transformer

Dual-Branch Infrared and Visible Image Fusion Framework

Using Edge-Guided Cross-Modal Transformer for Infrared and Visible Light Image Fusion

TCTFusion: A Triple-Branch Cross-Modal Transformer for Adaptive Infrared and Visible Image Fusion

Cross-Modal Transformers for Infrared and Visible Image Fusion