Thanks to recent development in CNNs and deep learning, solid improvements have been made in semantic segmentation, however, most of the previous works in semantic segmentation are for automatic driving and do not fully taken into account the specific difficulties that exist in high resolution remote sensing imagery. One of such difficulties is that objects are small, crowded and intra-class scale difference in remote sensing imagery. To tackle with this challenging task, we have proposed a novel architecture which adopts encoder-decoder structure, multi-scale dilated convolution with spatial attention and separable convolution (Global Attention Pyramid) and channel attention decoder (Attention Decoder). The proposed Global Attention Pyramid module solves these problems by enlarging receptive field without reducing resolution of feature maps and pixel-level attention. And the proposed Attention Decoder module solves these problems by providing global context to select category localization details. We tested our network on two satellite imagery datasets and acquired remarkably good results for both datasets especially for small objects. And our new network improves the performance from 0.6341 to 0.6510 in DEEPGLOBE road extraction dataset.
C.H. YeYunzhi ZhugePingping Zhang
Deyan SunWei ChenHai LiuDufeng ChenZehua WangYu‐Liang WuTingting XuPengcheng ZhuJiaqi Wang
N. AshokkumarShaik Javid BashaP. NagarajanT Kavitha
Bo HuangYiwei LuRuopeng YangYinglong MaPeng XuZhimin Yu