Diffusion-based image super-resolution (SR) methods have demonstrated remarkable performance. Recent advancements have introduced deterministic sampling processes that reduce inference from 15 iterative steps to a single step, thereby significantly improving the inference speed of existing diffusion models. However, their efficiency remains limited when handling complex semantic regions due to the single-step inference. To address this limitation, we propose SAMSR, a semantic-guided diffusion framework that incorporates semantic segmentation masks into the sampling process. Specifically, we introduce the SAM-Noise Module, which refines Gaussian noise using segmentation masks to preserve spatial and semantic features. Furthermore, we develop a pixel-wise sampling strategy that dynamically adjusts the residual transfer rate and noise strength based on pixel-level semantic weights, prioritizing semantically rich regions during the diffusion process. To enhance model training, we also propose a semantic consistency loss, which aligns pixel-wise semantic weights between predictions and ground truth. Extensive experiments on both real-world and synthetic datasets demonstrate that SAMSR significantly improves perceptual quality and detail recovery, particularly in semantically complex images.
Chuanren LiuZhenyu ZhangHao Tang
Maloba AbrahamZhihua HuKao ZhangMing Li
Zihao HeShengchuan ZhangRunze HuYunhang ShenYan Zhang
Leheng ZhangWeiyi YouKexuan ShiShuhang Gu
Yufei WangWenhan YangXinyuan ChenYaohui WangLanqing GuoLap‐Pui ChauZiwei LiuYu QiaoAlex C. KotBihan Wen