JOURNAL ARTICLE

Hypersphere-Based Remote Sensing Cross-Modal Text–Image Retrieval via Curriculum Learning

W ZhangJihao LiShuoke LiJialiang ChenWenkai ZhangXin GaoXian Sun

Year: 2023 Journal:   IEEE Transactions on Geoscience and Remote Sensing Vol: 61 Pages: 1-15   Publisher: Institute of Electrical and Electronics Engineers

Abstract

Remote sensing cross-modal text-image retrieval (RSCTIR) is a flexible and human-centered approach to retrieving rich information from different modalities, which has attracted plenty of attention in recent years. It remains challenging because the current methods usually ignore the varying difficulty levels of different sample pairs, stemming from the large image distribution difference and the high text similarity in the remote sensing (RS) field. Therefore, in this paper, we propose an innovative hypersphere-based visual semantic alignment (HVSA) network via curriculum learning. Specifically, we first design an adaptive alignment strategy based on curriculum learning, that aligns RS image-text pairs from easy to hard. Sample pairs with different levels of difficulty are treated unequally, and we obtain a better embedding representation when projecting the features onto the unit hypersphere. Then, to measure the robustness of cross-modal feature alignment on the unit hypersphere, we introduce the feature uniformity strategy. It reduces the occurrence of mismatching cases and improves generalization performance. Finally, we design the key-entity attention (KEA) mechanism to alleviate the problem of information imbalance among different modalities. KEA has the ability to extract information about the key entity which is aligned with textual information. Despite its conciseness, our framework achieves state-of-the-art performance on classical datasets of RSCTIR tasks while enjoying faster inference. The summed recall of HVSA on the RISCD and RSITMD is 120.97 and 198.94, 2.50 and 10.49 points ahead of the current best methods, respectively. Extensive experiments demonstrate the competitiveness of our method. The code has been released at https://github.com/ZhangWeihang99/HVSA.

Keywords:
Hypersphere Computer science Artificial intelligence Feature learning Inference Pattern recognition (psychology) Robustness (evolution) Feature extraction Feature (linguistics) Embedding MNIST database Machine learning Deep learning

Metrics

37
Cited By
6.73
FWCI (Field Weighted Citation Impact)
71
Refs
0.96
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Multimodal Machine Learning Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Advanced Image and Video Retrieval Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Domain Adaptation and Few-Shot Learning
Physical Sciences →  Computer Science →  Artificial Intelligence
© 2026 ScienceGate Book Chapters — All rights reserved.