The goal of infrared and visible image fusion is to generate a fused image that integrates both prominent targets and fine textures. However, many existing fusion algorithms overly emphasize visual quality and traditional statistical evaluation metrics while neglecting the requirements of real-world applications, especially in high-level vision tasks. To address this issue, this paper proposes a semantic segmentation-driven image fusion framework based on knowledge distillation. By incorporating a distributed structure of teacher and student networks, the framework leverages knowledge distillation to reduce network complexity, ensuring that the fused images are not only visually enhanced but also well-suited for downstream high-level vision tasks. Additionally, the introduction of two discriminators further optimizes the overall quality of the fused images, while the integration of a semantic segmentation module ensures that the fused images provide valuable support for advanced vision tasks. To enhance both fusion performance and segmentation capability, this paper proposes a joint training strategy that enables the fusion and segmentation networks to mutually improve during training. Experimental results on three public datasets demonstrate that the proposed method outperforms nine state-of-the-art fusion approaches in terms of visual quality, evaluation metrics, and semantic segmentation performance. Finally, ablation studies on the segmentation network further validate the effectiveness of the proposed method.
Siling FengQiaoyun WangCong LinMengxing Huang
Qishen LvRui YangChengmin ZhangShuaihui LiuFan XinyanZihao Luo
Yang JiangJiawei LiJinyuan LiuJia LeiChen LiShihua ZhouNikola Kasabov
Zhou Hua-bingJilei HouWei WuYanduo ZhangYuntao WuJiayi Ma