Linrui ShiGaochang WuYingqian WangYebin LiuTianyou Chai
Thermal imaging offers valuable properties, but suffers from inherently low spatial resolution, which can be enhanced using a high-resolution (HR) visible image as guidance. However, the substantial modality differences between thermal and visible images, coupled with significant resolution gaps, pose challenges to existing guided super-resolution (SR) approaches. In this article, we present dual-conditional diffusion (DuaDiff), an innovative diffusion model featuring a dual-conditioning mechanism to enhance guided thermal image SR. Unlike typical conditional diffusion models, DuaDiff integrates a learnable Laplacian pyramid to extract high-frequency details from the visible image, serving as one of the conditioning inputs. By capturing multiscale high-frequency components, DuaDiff effectively focuses on intricate textures and edges in the HR visible images, significantly enhancing thermal image fidelity. Furthermore, we project both thermal and visible images into a semantic latent space, constructing another conditioning input. Leveraging these complementary conditions, DuaDiff employs a multimodal latent feature cross-attention module to facilitate effective interaction between noise, thermal, and visible latent representations. Extensive experiments on the FLIR-ADAS and CATS datasets for $4\times $ and $8\times $ guided SR demonstrate that combining learnable Laplacian conditioning with semantic latent conditioning enables DuaDiff to surpass state-of-the-art methods in both visual quality and metric evaluation, particularly in scenarios with a large resolution gap. Besides, the applications to downstream tasks further confirm the capability of DuaDiff to recover high-fidelity semantic information. The code will be released.
Leheng ZhangWeiyi YouKexuan ShiShuhang Gu
Guoning ChenZhenfeng ZhuZhizhe LiuChen LinShuai ZhengHongli XuYao ZhaoKunlun He
Fanen MengSensen WuLaifu ZhangHaoyu JingYijun ChenYiming YanTian FengRenyi LiuZhenhong Du