Semantic segmentation aims to assign correct category label for every pixel in the image. Pixel-level classification and locality are thus both essential. To this end, previous methods try to fuse the multi-scale features of backbone network, including high-level category distinguishing cues and low-level location details. Other methods are designed to incorporate contextual cues or boundary guidance. However, previous methods are usually designed to refine the multi-scale feature fusion after the last fusion stage under the guidance of context or boundary, resulting in separate processing pipeline. This post-processing strategy is hard to remedy the context inconsistency and blurring boundary within fused features, which inevitably generates object segmentation with inconsistent context and blob-like contour. In this paper, we aim to bridge this gap by seamlessly introducing the context or boundary guidance into the multiple feature fusion operations. In this way, multi-scale features are effectively combined while maintaining context consistency and sharp object boundary, leading to enhanced semantic segment coherence. Experimental results on Cistyscapes and ADE20K datasets show the superiority of the proposed method.
Quan ZhouLinjie WangGuangwei GaoBin KangWeihua OuHuimin Lu
Shan ZhaoJing NingFukai ZhangZhanqiang HuoYingxu Qiao
Fengqin YaoShengke WangLaihui DingGuoqiang ZhongShu LiZhiwei Xu
Jing DuZuning JiangShangfeng HuangZongyue WangJinhe SuSongjian SuYundong WuGuorong Cai