RGB-D salient object detection has been one of the hottest research topics in the field of computer vision in recent years. The research in this field aims to achieve automatic detection and segmentation of salient targets in scenes by combining information from RGB images and depth images. The existing RGB-D salient object detection methods generally operate directly on different modules without considering the complementarity between different modes. And for multi-modal fusion, the differences between different modalities have not been fully explored. In order to better solve the above two problems, we propose a cross-guided cross-modal feature fusion network(CCFFNet). It is composed of a cross-guided feature enhancement (CFE) module and a multi-modal feature fusion (MFF) module. Specifically, in the cross-guided feature enhancement module, the representation of cross modal features is enhanced through guided learning of the mutual feature weights between RGB and depth, fully exploring the complementarity between RGB and depth. In addition, we also utilize multiple different levels of modal features to participate in fusion, enhancing the fusion features through attention, making the model significance prediction of RGB-D salient object detection more accurate. Finally, extensive experiments on five benchmark datasets have shown that our model outperforms the other seven state-of-the-art methods, while also demonstrating its superiority.
Zhenyu ZhangHuiyan ChenQingzhen XuQiang Chen
Shuaihui WangFengyi JiangBoqian Xu
Bojian ChenWenbin WuZhezhou LiTengfei HanZhuolei ChenWeihao Zhang
Yanbin PengZhinian ZhaiMingkun Feng
Hongbo BiJiayuan ZhangRanwan WuYuyu TongWei Jin