Shengli JiangGuihong LiWei YaoZhenjie HongTae‐Yong Kuc
Abstract. Semantic segmentation is a fundamental research task in computer vision, which intends to assign a certain category to every pixel. Currently, most existing methods only utilize the deepest feature map for decoding, while high-level features get inevitably lost during the procedure of down-sampling. In the decoder section, transposed convolution or bilinear interpolation was widely used to restore the size of the encoded feature map; however, few optimizations are applied during up-sampling process which is detrimental to the performance for grouping and classification. In this work, we proposed a dual pyramids encoder-decoder deep neural network (DPEDNet) to tackle the above issues. The first pyramid integrated and encoded multi-resolution features through sequentially stacked merging, and the second pyramid decoded the features through dense atrous convolution with chained up-sampling. Without post-processing and multi-scale testing, the proposed network has achieved state-of-the-art performances on two challenging benchmark image datasets for both ground and aerial view scenes.
S. NivethaR. VaishnaviRajagopal RekhaM. Indhuja
Rajagopal RekhaM. IndhujaS. NivethaR. Vaishnavi
J. S. LewisYoung‐Jin ChaJongho Kim
Chengxin LiuShuaiyuan DuHao LüDehui LiZhiguo Cao
Xiaopin ZhaoWeibin LiuWeiwei Xing