Wangbo ZhaoJing ZhangLong LiNick BarnesNian LiuJunwei Han
Significant performance improvement has been achieved for fully-supervised video salient object detection with the pixel-wise labeled training datasets, which are time-consuming and expensive to obtain. To relieve the burden of data annotation, we present the first weakly super-vised video salient object detection model based on relabeled “fixation guided scribble annotations”. Specifically, an "Appearance-motion fusion module" and bidirectional ConvLSTM based framework are proposed to achieve effective multi-modal learning and long-term temporal context modeling based on our new weak annotations. Further, we design a novel foreground-background similarity loss to further explore the labeling similarity across frames. A weak annotation boosting strategy is also introduced to boost our model performance with a new pseudo-label generation technique. Extensive experimental results on six benchmark video saliency detection datasets illustrate the effectiveness of our solution 1 .
Shuyong GaoHaozhe XingWei ZhangYan WangQianyu GuoWenqiang Zhang
Zijian LiangPengjie WangKe XuPingping ZhangRynson W. H. Lau
Zhentao JiangQiang ChenBo JiangCong LengJian Cheng
Xiaoyang ZhengXin TanJie ZhouLizhuang MaRynson W. H. Lau