JOURNAL ARTICLE

Cascaded Hierarchical Attention with Adaptive Fusion for Visual Grounding in Remote Sensing

Huming ZhuTianqi GaoZhixian LiZhipeng ChenQiuming LiKongmiao MiaoBiao HouLicheng Jiao

Year: 2025 Journal:   Remote Sensing Vol: 17 (17)Pages: 2930-2930   Publisher: Multidisciplinary Digital Publishing Institute

Abstract

Visual grounding for remote sensing (RSVG) is the task of localizing the referred object in remote sensing (RS) images by parsing free-form language descriptions. However, RSVG faces the challenge of low detection accuracy due to unbalanced multi-scale grounding capabilities, where large objects have more prominent grounding accuracy than small objects. Based on Faster R-CNN, we propose Faster R-CNN in Visual Grounding for Remote Sensing (FR-RSVG), a two-stage method for grounding RS objects. Building on this foundation, to enhance the ability to ground multi-scale objects, we propose Faster R-CNN with Adaptive Vision-Language Fusion (FR-AVLF), which introduces a layered Adaptive Vision-Language Fusion (AVLF) module. Specifically, this method can adaptively fuse deep or shallow visual features according to the input text (e.g., location-related or object characteristic descriptions), thereby optimizing semantic feature representation and improving grounding accuracy for objects of different scales. Given that RSVG is essentially an expanded form of RS object detection, and considering the knowledge the model acquired in prior RS object detection tasks, we propose Faster R-CNN with Adaptive Vision-Language Fusion Pretrained (FR-AVLFPRE). To further enhance model performance, we propose Faster R-CNN with Cascaded Hierarchical Attention Grounding and Multi-Level Adaptive Vision-Language Fusion Pretrained (FR-CHAGAVLFPRE), which introduces a cascaded hierarchical attention grounding mechanism, employs a more advanced language encoder, and improves upon AVLF by proposing Multi-Level AVLF, significantly improving localization accuracy in complex scenarios. Extensive experiments on the DIOR-RSVG dataset demonstrate that our model surpasses most existing advanced models. To validate the generalization capability of our model, we conducted zero-shot inference experiments on shared categories between DIOR-RSVG and both Complex Description DIOR-RSVG (DIOR-RSVG-C) and OPT-RSVG datasets, achieving performance superior to most existing models.

Keywords:
Remote sensing Computer science Fusion Environmental science Geology

Metrics

0
Cited By
0.00
FWCI (Field Weighted Citation Impact)
52
Refs
0.43
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Topics

Remote-Sensing Image Classification
Physical Sciences →  Engineering →  Media Technology
Advanced Image Fusion Techniques
Physical Sciences →  Engineering →  Media Technology
Remote Sensing and Land Use
Physical Sciences →  Earth and Planetary Sciences →  Atmospheric Science

Related Documents

JOURNAL ARTICLE

Adaptive Scale Fusion via Uncertainty Estimation for Visual Grounding in Remote Sensing Images

Zhipeng ZhangYang ZouJi WangPeng Wang

Journal:   IEEE Transactions on Geoscience and Remote Sensing Year: 2025 Vol: 64 Pages: 1-12
JOURNAL ARTICLE

Improving visual grounding in remote sensing images with adaptive modality guidance

Shabnam ChoudhuryPratham KurkureBiplab Banerjee

Journal:   ISPRS Journal of Photogrammetry and Remote Sensing Year: 2025 Vol: 224 Pages: 42-58
JOURNAL ARTICLE

Language-Guided Progressive Attention for Visual Grounding in Remote Sensing Images

Ke LiDi WangHaojie XuHaodi ZhongCong Wang

Journal:   IEEE Transactions on Geoscience and Remote Sensing Year: 2024 Vol: 62 Pages: 1-13
JOURNAL ARTICLE

Visual Grounding in Remote Sensing Images

Yuxi SunShanshan FengXutao LiYunming YeJian KangXu Huang

Journal:   Proceedings of the 30th ACM International Conference on Multimedia Year: 2022 Pages: 404-412
JOURNAL ARTICLE

Hierarchical Attention and Bilinear Fusion for Remote Sensing Image Scene Classification

Donghang YuHaitao GuoQing XuJun LuChuan ZhaoYuzhun Lin

Journal:   IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing Year: 2020 Vol: 13 Pages: 6372-6383
© 2026 ScienceGate Book Chapters — All rights reserved.