Chandrashekhar S. PawarAshwin Makwana
The aim of remote sensing image captioning (RSIC) is to obtain insightful and detailed textual description of satellite images and aerial images. However, traditional methods are not able to achieve this aim effectively due to a lack of contextual awareness caused by variations in scale, viewpoint and scene complexity. In this paper, we propose a method, the Multiscale Region-Aware Captioning Network (MSR-CapNet), which helps to achieve the aim of RSIC by generating relevant and semantically correct textual descriptions for scenes in satellite images (and aerial images). We train and test our method for the purpose of RSIC on the RSICD and UCM caption datasets. In our MSR-CapNet method, we have integrated Feature Pyramid Encoding (used for local and global visual characteristics representation), Adaptive Attention (which helps in dynamic prioritization of relevant regions) and Topic-Sensitive Embeddings (to generate semantically consistent captions). To show the effectiveness of the proposed method (MSR-CapNet), we compared it with existing techniques (recent transformer and graph-based baselines) using BLEU-4, METEOR, and CIDEr measures, where it shows consistent improvement over existing techniques.
Haiyan HuangZhenfeng ShaoQimin ChengXiaoping Wu
Cheng ZhangZhongle RenBiao HouChanghui XuJianhua MengWeibin LiLicheng Jiao
Yun MengYu GuXiutiao YeJingxian TianShuang WangHe ZhangBiao HouLicheng Jiao
Shuang WangXiutiao YeYu GuJihui WangYun MengJingxian TianBiao HouLicheng Jiao