JOURNAL ARTICLE

Infrared–Visible Image Fusion via Cross-Modal Guided Dual-Branch Networks

T. J. ZhuJinyong ChenGang Wang

Year: 2025 Journal:   Applied Sciences Vol: 15 (22)Pages: 12185-12185   Publisher: Multidisciplinary Digital Publishing Institute

Abstract

In the field of low-altitude aerial drone data fusion, the fusion of infrared and visible light images remains challenging due to issues such as large modal differences, insufficient cross-modal alignment, and limited global context modeling. Traditional methods struggle to extract complementary information across modalities, while deep learning methods often lack sufficient global receptive fields (convolutional neural networks) or fail to preserve local details (standard Transformers). To address these issues, we propose a Cross-modal Guided Dual-Branch Network (CGDBN) that combines convolutional neural networks and Transformer architecture. Our framework contribution: We designed a Target-modal Feature Extraction Mechanism (TMFEM) module with specialized thermal characteristics for infrared feature extraction, which does not require processing of visible light features; we introduced Simplified Linear Attention Blocks (SLABs) into our framework to improve global context capture as a module; we designed a Cross-Modal Interaction Mechanism (CMIM) module for bidirectional feature interaction; and we designed a Density Adaptive Multimodal Fusion (DAMF) module that weights modal contributions based on content analysis. This asymmetric design recognizes that different types of images have different characteristics and require targeted processing. The experimental results on AVMS, M3FD, and TNO datasets show that the proposed model has a peak signal-to-noise ratio (PSNR) of 16.2497 on the AVMS dataset, which is 0.9971 higher than the best benchmark method YDTR (peak signal-to-noise ratio: approximately 15.2526). The peak signal-to-noise ratio on the M3FD dataset is 16.5044, which is 0.7480 higher than the best benchmark method YDTR (peak signal-to-noise ratio of approximately 15.7564). The peak signal-to-noise ratio on the TNO dataset is 17.3956, which is 0.7934 higher than the best benchmark method YDTR (peak signal-to-noise ratio: approximately 16.6022), and the overall performance on all other indicators is among the top in all comparison models. This method has broad application prospects in fields such as drone data fusion.

Keywords:

Metrics

0
Cited By
0.00
FWCI (Field Weighted Citation Impact)
35
Refs
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Topics

Related Documents

© 2026 ScienceGate Book Chapters — All rights reserved.