RGB-D salient object detection (SOD) is a crucial preprocessing step for diverse vision tasks. Despite some progress in deep learning-based approaches, RGB-D SOD encounters persistent challenges. Given the hierarchical significance of multimodal images, necessitating hierarchical processing, coupled with the inherent unreliability of depth maps, forceful fusion can introduce noise, adversely impacting detection outcomes. In addition the information from depth maps is not completely reliable and forced fusion can introduce noise and negatively affect the detection results. Therefore, the Semantic and Detail Fusion Network (SDF-Net) has been proposed. The SDF-Net first fuses depth maps and RGB maps hierarchically from low to high level by means of a Depth Refinement Module (DRM). Next, the high-level semantic features are input into the Gradual Expansion Module (GEM) in the standard decoder, and the output results are passed to the lower layers for complementary use of low-level details. Finally, the salient regions are complemented from top to bottom by the recursive transposition module (RTM) to further refine the edges. Experiments demonstrate that our method shows effectiveness both quantitatively and qualitatively compared to 10 state-of-the-art methods on 5 datasets.
Yanhua LiangGuihe QinMinghui SunJun QinJie YanZhonghan Zhang
Bahareh AsheghiPedram SalehpourAbdolhamid Moallemi KhiaviMahdi HashemzadehAmirhassan Monajemi
Fengming SunXia YuanChunxia Zhao
Xuelong LiDawei SongYongsheng Dong