Multi-Scale Semantic Fusion Network with Adaptive Attention for Remote Sensing Image Captioning

Chandrashekhar S. Pawar; Ashwin Makwana

doi:10.36548/jtcsst.2025.4.005

ScienceGate Book Chapters

JOURNAL ARTICLE

Multi-Scale Semantic Fusion Network with Adaptive Attention for Remote Sensing Image Captioning

Chandrashekhar S. Pawar Ashwin Makwana

Year: 2025 Journal: Journal of Trends in Computer Science and Smart Technology Vol: 7 (4)Pages: 700-726

DOI: 10.36548/jtcsst.2025.4.005

Get Full-Text PDF Get Analytical Report

Abstract

The aim of remote sensing image captioning (RSIC) is to obtain insightful and detailed textual description of satellite images and aerial images. However, traditional methods are not able to achieve this aim effectively due to a lack of contextual awareness caused by variations in scale, viewpoint and scene complexity. In this paper, we propose a method, the Multiscale Region-Aware Captioning Network (MSR-CapNet), which helps to achieve the aim of RSIC by generating relevant and semantically correct textual descriptions for scenes in satellite images (and aerial images). We train and test our method for the purpose of RSIC on the RSICD and UCM caption datasets. In our MSR-CapNet method, we have integrated Feature Pyramid Encoding (used for local and global visual characteristics representation), Adaptive Attention (which helps in dynamic prioritization of relevant regions) and Topic-Sensitive Embeddings (to generate semantically consistent captions). To show the effectiveness of the proposed method (MSR-CapNet), we compared it with existing techniques (recent transformer and graph-based baselines) using BLEU-4, METEOR, and CIDEr measures, where it shows consistent improvement over existing techniques.

Keywords:

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Multi-Scale Semantic Fusion Network with Adaptive Attention for Remote Sensing Image Captioning

Abstract

Metrics

Topics

Related Documents

Multi-Scale Feature Fusion Network for Remote Sensing Image Captioning

Adaptive Scale-Aware Semantic Memory Network for Remote Sensing Image Captioning

Multi-scale Attentive Fusion Network for Remote Sensing Image Change Captioning

Multi-View Attention Network for Remote Sensing Image Captioning

Multi-label semantic feature fusion for remote sensing image captioning