Most of the existing image fusion methods prefer to use the adversarial learning game to fuse infrared and visible imagess. However, such single adversarial mechanism makes image fusion task easily ignore global contextual information. To this end, this paper proposes a CNN-Transformer dual-process-based generative adversarial network (CTDPGAN) to fuse infrared and visible images. In generator, a dual-process-based module composed by a CNN block and a Swin-Transformer block is proposed. The channel filter and spatial filter in the CNN block has the ability to adaptively extract additional complementary information from images of various modalities while preserving the shallow features of the source images. The Swin-Transformer Block (STRB) is designed to establish local attention by dividing non-overlapping windows and then to bridge global attention by interacting windows. In addition, we introduce generative adversarial learning networks into the training process, the dual-channel transformer discriminators are designed to improve the discriminative ability of the fused image. Thus, the fused image learns the distribution of global contextual information from source images and retain competitive visible light and infrared domains in more balanced manner. Moreover, we introduce the primary and auxiliary feature concepts into the structural similarity loss function and spatial frequency loss function, which will enable the generator to produce a fused image that retains thermal radiation information and rich detail information. Finally, the experimental findings demonstrate that, in both subjective and objective assessments, our model produces outcomes that are equivalent to or superior compared to state-of-the-art image fusion methods.
Dongyu RaoTianyang XuXiao‐Jun Wu