At present, most of the research methods of image semantic segmentation are based on Fully Convolutional Networks (FCN). However, FCN will cause the loss of image feature information when performing image semantic segmentation, and the details of the output image will not be processed well. Therefore, we propose to take the ResNet network as the encoder basic network. Using dilated convolution to extract context information, and designing a multi-scale feature fusion method in the decoder to make full use of features from each level to enrich representative ability of feature points, so that it can classify image pixels well. Extensive experiments demonstrate that our method shows superior performance over other methods on the PASCAL VOC2012 [10]validation dataset.
Dong Seop KimYu Hwan KimKang Ryoung Park
Mengmeng KangHao YangXiaojing GuXingsheng Gu