Most advanced CNN networks for semantic segmentation tasks rely on ImageNet pretrained backbone and adopt context multi-scale feature perception modules, enabling the network to have multi-scale information perception capabilities. This approach can effectively improve the segmentation accuracy of semantic segmentation networks, but pretrained backbone incurs training costs and introducing additional modules can increase the number of parameters.By drawing inspiration from ResNext, a lightweight module with parallel branches and multi-scale perception capabilities by combining group convolution and dilated convolution is proposed. Our feature extraction backbone of the semantic segmentation network is constructed by this module, enabling the network to have multi-scale feature perception capabilities and reducing the loss of spatial information caused by fast downsample without the use of extra special multi-scale perception modules. At the same time, a dual-branch multi-scale input strategy combined with bilateral interaction is used in encoder to ensure that the network can fully learn multi-scale information, and a lightweight stepwise upsampling decoder is proposed for decoding. Our DMSegNet achieves 76.7% mIoU on Cityscapes and 80.9% mIoU on CamVid with 3.2M parameters, ensuring lightweight performance.
Haoran YangDan ZhangJiazai LiuZekun CaoNa Wang
Jiayin TengTong ZhangGuochao Fan
Buer SongHan PanXuyang ZhangWeizhi Qu
Weizhen WangSuyu WangYue LiYishu Jin