JOURNAL ARTICLE

Masking-Based Cross-Modal Remote Sensing Image–Text Retrieval via Dynamic Contrastive Learning

Zuopeng ZhaoXiaoran MiaoChen HeJianfeng HuBingbing MinYumeng GaoYing LiuKanyaphakphachsorn Pharksuwan

Year: 2024 Journal:   IEEE Transactions on Geoscience and Remote Sensing Vol: 62 Pages: 1-15   Publisher: Institute of Electrical and Electronics Engineers

Abstract

Cross-modal remote sensing image-text retrieval (CMRSITR) aims to extract comprehensive information from diverse modalities. The primary challenge in this field is developing effective mappings between visual and textual modalities to a shared latent space. Existing approaches generally focus on utilizing pre-trained unimodal models to independently extract features from each modality. However, these techniques often fall short in achieving the critical alignment necessary for effective cross-modal matching. These techniques predominantly concentrate on the extraction of features and alignment at an instance level, suggesting potential areas for enhancement. To address these limitations, we introduce the Masked Interaction Inferring and Aligning (MIIA) framework, utilizing Dynamic Contrastive Learning (DCL). This framework is adept at discerning the intricate relationships between local visual-textual tokens, thereby significantly bolstering the congruence of global image-text pairings without relying on additional prior supervision. Initially, we devise a Masked Interaction Inferring (MII) module, which fosters token-level interplays through a novel masked visual-language modeling approach. Following this, we implement a cross-modal dynamic contrast learning (DCL) mechanism, which is instrumental in capturing and aligning semantic correlations between images and texts more effectively. Finally, to ensure the comprehensive matching of visual and textual embeddings, we introduce a unique technique known as Bidirectional Distribution Matching (BDM). This method is designed to minimize the Kullback-Leibler (KL) divergence between the distributions of image-text similarity, computed using the negative queues in momentum contrast learning. Comprehensive experiments performed on well-established public datasets consistently validate the state-of-the-art performance of MIIA methods in the CMRSITR task.

Keywords:
Computer science Artificial intelligence Matching (statistics) Contrast (vision) Focus (optics) Modality (human–computer interaction) Modal Feature learning Similarity (geometry) Modalities Pattern recognition (psychology) Natural language processing Machine learning Image (mathematics)

Metrics

17
Cited By
9.01
FWCI (Field Weighted Citation Impact)
78
Refs
0.96
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Multimodal Machine Learning Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Advanced Image and Video Retrieval Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Domain Adaptation and Few-Shot Learning
Physical Sciences →  Computer Science →  Artificial Intelligence

Related Documents

JOURNAL ARTICLE

Hypersphere-Based Remote Sensing Cross-Modal Text–Image Retrieval via Curriculum Learning

W ZhangJihao LiShuoke LiJialiang ChenWenkai ZhangXin GaoXian Sun

Journal:   IEEE Transactions on Geoscience and Remote Sensing Year: 2023 Vol: 61 Pages: 1-15
JOURNAL ARTICLE

A fusion-based contrastive learning model for cross-modal remote sensing retrieval

Haoran LiWei XiongYaqi CuiZhenyu Xiong

Journal:   International Journal of Remote Sensing Year: 2022 Vol: 43 (9)Pages: 3359-3386
JOURNAL ARTICLE

Cross-Modal Contrastive Learning for Remote Sensing Image Classification

Zhixi FengLiangliang SongShuyuan YangXinyu ZhangLicheng Jiao

Journal:   IEEE Transactions on Geoscience and Remote Sensing Year: 2023 Vol: 61 Pages: 1-13
JOURNAL ARTICLE

Contrastive Learning‐Based Fine‐Tuning Method for Cross‐Modal Text‐Image Retrieval

Wei ZhaoXuan MaWeigang Wang

Journal:   Concurrency and Computation Practice and Experience Year: 2025 Vol: 37 (21-22)
© 2026 ScienceGate Book Chapters — All rights reserved.