JOURNAL ARTICLE

Remote Sensing Image Segmentation and Captioning Using Deep Learning

Abstract

The analysis of remote-sensing images can be of great importance, as it can directly impact people's lives such as monitoring environmental changes, or even traffic congestion in cities. Describing the content of a remote-sensing image using natural languages and segmenting the image into different semantic classes can help in the analysis process. Deep learning architectures have proved their effectiveness when used for many applications including computer vision tasks. For segmentation, we have used the U-net deep learning model. Regarding captioning, a CNN-Transformer-based model for image captioning has been tested with different CNN configurations and model architectures. We have also used an image processing technique to be applied to the segmented image produced by the U-net to further analyze the image and augment the predicted caption to give more useful information about the image. Our approach has allowed enriching the captions for the remote-sensing images by using the segmentation output.

Keywords:
Closed captioning Computer science Artificial intelligence Image segmentation Deep learning Segmentation Computer vision Image (mathematics) Transformer Process (computing) Image processing Pattern recognition (psychology) Engineering

Metrics

2
Cited By
0.36
FWCI (Field Weighted Citation Impact)
24
Refs
0.58
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Multimodal Machine Learning Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Advanced Image and Video Retrieval Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Advanced Neural Network Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
© 2026 ScienceGate Book Chapters — All rights reserved.