JOURNAL ARTICLE

Visual Grounding in Remote Sensing Images

Yuxi SunShanshan FengXutao LiYunming YeJian KangXu Huang

Year: 2022 Journal:   Proceedings of the 30th ACM International Conference on Multimedia Pages: 404-412

Abstract

Ground object retrieval from a large-scale remote sensing image is very important for lots of applications. We present a novel problem of visual grounding in remote sensing images. Visual grounding aims to locate the particular objects (in the form of the bounding box or segmentation mask) in an image by a natural language expression. The task already exists in the computer vision community. However, existing benchmark datasets and methods mainly focus on natural images rather than remote sensing images. Compared with natural images, remote sensing images contain large-scale scenes and the geographical spatial information of ground objects (e.g., longitude, latitude). The existing method cannot deal with these challenges. In this paper, we collect a new visual grounding dataset, called RSVG, and design a new method, namely GeoVG. In particular, the proposed method consists of a language encoder, image encoder, and fusion module. The language encoder is used to learn numerical geospatial relations and represent a complex expression as a geospatial relation graph. The image encoder is applied to learn large-scale remote sensing scenes with adaptive region attention. The fusion module is used to fuse the text and image feature for visual grounding. We evaluate the proposed method by comparing it to the state-of-the-art methods on RSVG. Experiments show that our method outperforms the previous methods on the proposed datasets. https://sunyuxi.github.io/publication/GeoVG

Keywords:
Computer science Artificial intelligence Computer vision Encoder Remote sensing Geography

Metrics

55
Cited By
3.80
FWCI (Field Weighted Citation Impact)
38
Refs
0.95
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Multimodal Machine Learning Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Advanced Image and Video Retrieval Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Domain Adaptation and Few-Shot Learning
Physical Sciences →  Computer Science →  Artificial Intelligence

Related Documents

JOURNAL ARTICLE

Transferring CLIP for visual grounding in remote sensing images

Linlin LiangYizhuo QuanChengbo WangYuanfei ChangYanyou Qiao

Journal:   International Journal of Digital Earth Year: 2025 Vol: 18 (1)
JOURNAL ARTICLE

Directional-Semantic-Enhanced Visual Grounding for Remote Sensing Images

Hu GuoBin SunShutao LiChenglong LeiXiliang LiMingkui Tan

Journal:   IEEE Transactions on Geoscience and Remote Sensing Year: 2025 Vol: 63 Pages: 1-14
JOURNAL ARTICLE

A Regionally Indicated Visual Grounding Network for Remote Sensing Images

Renlong HangSiqi XuQingshan Liu

Journal:   IEEE Transactions on Geoscience and Remote Sensing Year: 2024 Vol: 62 Pages: 1-11
JOURNAL ARTICLE

Language-Guided Progressive Attention for Visual Grounding in Remote Sensing Images

Ke LiDi WangHaojie XuHaodi ZhongCong Wang

Journal:   IEEE Transactions on Geoscience and Remote Sensing Year: 2024 Vol: 62 Pages: 1-13
JOURNAL ARTICLE

Improving visual grounding in remote sensing images with adaptive modality guidance

Shabnam ChoudhuryPratham KurkureBiplab Banerjee

Journal:   ISPRS Journal of Photogrammetry and Remote Sensing Year: 2025 Vol: 224 Pages: 42-58
© 2026 ScienceGate Book Chapters — All rights reserved.