Many existing deep learning methods have been proposed for Salient Object Detection (SOD) in the natural images, however they may be not compatible enough for remote sensing images by ignoring some unique domain knowledge for remote sensing images. For example, satellite images might contain more complex contexts than natural images, and many salient objects in the satellite images are small-size objects, but the existing deep learning based SOD methods for natural images do not have these special considerations. In this paper, we propose a new Transformer-aware Encoder-Decoder Network (TEDNet) combining a hybrid Convolutional Neural Network-Transformer encoder and a Transformer-enhanced decoder to learn the complex context features from the local neighbors by convolution and the long-range region dependency by Transformer for the SOD task in remote sensing images. Furthermore, we propose a new image-level and pixel-level size-guided loss for the small salient object mining to train the proposed TEDNet. Experimental results on a publicized remote sensing SOD dataset show the effectiveness and accuracy of the proposed method.
Pengwei DongBo WangRunmin CongHai-Han SunChongyi Li
Yu LiuJie LinGongtao YueZhaosheng ShaoShanwen Zhang
Longxuan YuXiaofei ZhouLingbo WangJiyong Zhang
Xin WangZhilu ZhangShihan JingHuiyu Zhou
Gongyang LiZhen BaiZhi LiuXinpeng ZhangHaibin Ling