Congcong WangFaouzi Alaya CheikhAzeddine BeghdadiOle Jakob Elle
The object sizes in images are diverse, therefore, capturing multiple scale context information is essential for semantic segmentation. Existing context aggregation methods such as pyramid pooling module (PPM) and atrous spatial pyramid pooling (ASPP) employ different pooling size\n or atrous rate, such that multiple scale information is captured. However, the pooling sizes and atrous rates are chosen empirically. Rethinking of ASPP leads to our observation that learnable sampling locations of the convolution operation can endow the network learnable fieldof- view, thus\n the ability of capturing object context information adaptively. Following this observation, in this paper, we propose an adaptive context encoding (ACE) module based on deformable convolution operation where sampling locations of the convolution operation are learnable. Our ACE module can\n be embedded into other Convolutional Neural Networks (CNNs) easily for context aggregation. The effectiveness of the proposed module is demonstrated on Pascal-Context and ADE20K datasets. Although our proposed ACE only consists of three deformable convolution blocks, it outperforms PPM and\n ASPP in terms of mean Intersection of Union (mIoU) on both datasets. All the experimental studies confirm that our proposed module is effective compared to the state-of-the-art methods.
Hang ZhangKristin DanaJianping ShiZhongyue ZhangXiaogang WangAmbrish TyagiAmit Agrawal
Tingting LiangQijie ZhaoZhuoying WangKaiyu ShanHuan ZhangYongtao WangZhi Tang
Junjun HeZhongying DengLei ZhouYali WangYu Qiao
Guodong ZhangWenzhu YangGuoyu Zhou
Hao LiuYulan GuoYanni MaYinjie LeiGongjian Wen