Liang LiaoLiang WanMingsheng LiuShusheng Li
When some application scenarios need to use semantic segmentation technology, like automatic driving, the primary concern comes to real-time performance rather than extremely high segmentation accuracy. To achieve a good trade-off between speed and accuracy, two-branch architecture has been proposed in recent years. It treats spatial information and semantics information separately which allows the module to be composed of two networks both not heavy. However, the process of fusing features with two different scales becomes a performance bottleneck for many nowaday two-branch models. In this research, we design a new fusion mechanism for two-branch architecture which is guided by attention computation. To be precise, we use the Dual-Guided Attention (DGA) module we proposed to replace some multi-scale transformations with the calculation of attention which means we only use several attention layers of near linear complexity to achieve performance comparable to frequently-used multi-layer fusion. To ensure that our module can be effective, we use Residual U-blocks (RSU) to build one of the two branches in our networks which aims to obtain better multi-scale features. Extensive experiments on Cityscapes and CamVid dataset show the effectiveness of our method. On Cityscapes, our light version network without pretrain weight can achieve 71.1% mIoU at 163 FPS on a single Nvidia RTX 3070 using full resolution images(1024×2048pix). And the large version can achieve 77.9% mIoU with a speed of 43 FPS which still reaches the real-time criterion. Our code and module has been open sourced at https://github.com/LikeLidoA/Mymodule.
Shiming XiangDong ZhouDan TianZihao Wang
Liang LiaoLiang WanMingsheng LiuShusheng Li
Chengcheng SunAiguo ChenYuheng PengJie Zhong
Xiaobo HuHongbo ZhuNing SuTaosheng Xu