The analysis of remote-sensing images can be of great importance, as it can directly impact people's lives such as monitoring environmental changes, or even traffic congestion in cities. Describing the content of a remote-sensing image using natural languages and segmenting the image into different semantic classes can help in the analysis process. Deep learning architectures have proved their effectiveness when used for many applications including computer vision tasks. For segmentation, we have used the U-net deep learning model. Regarding captioning, a CNN-Transformer-based model for image captioning has been tested with different CNN configurations and model architectures. We have also used an image processing technique to be applied to the segmented image produced by the U-net to further analyze the image and augment the predicted caption to give more useful information about the image. Our approach has allowed enriching the captions for the remote-sensing images by using the segmentation output.
Bhavitha YamaniNikhil MedavarapuS. Rakesh
Aishwary ShreeRohit SharmaJai G. SinglaDeepak Sinwar
Binze WangJiangbo XiXingrun WangJianwu FangWandong JiangDashuai XieYaobing Xiang
Berkay AteşOrkhan KarimliMehmet Fatih AmasyalıAli Can Karaca
Yichao WeiLin LiShengling Geng