JOURNAL ARTICLE

Cross-Modal Feature Fusion Retrieval for Remote Sensing Image-Voice Retrieval

Abstract

With the increasing popularity of remote sensing technology applications, some emergency scenarios require rapid retrieval of remote sensing images, such as earthquake rescue, etc. Due to the high efficiency of voice input, researchers have focused on cross-modal remote sensing image-voice retrieval methods. However, these methods have two major drawbacks: speech input lacks discrimination and the intra-modal semantic information is under used. To address these drawbacks, we propose a novel cross-modal feature fusion retrieval model. Our model provides a more optimized cross-modal common feature space than previous models and thus optimizes the retrieval performance. First, our model adds the extra textual keyword information to the audio feature for remote sensing image retrieval. Second, it introduces inter-modality adversarial learning and intra-modality semantic discrimination into the remote sensing image-voice retrieval task. We conducted experiments on two datasets modified from the UCM-Captions dataset and the Remote Sensing Image Caption Dataset. The experimental results show that our model outperforms state-of-the-art models in this task.

Keywords:
Computer science Feature (linguistics) Image retrieval Modal Modality (human–computer interaction) Artificial intelligence Feature vector Visual Word Task (project management) Pattern recognition (psychology) Information retrieval Computer vision Image (mathematics)

Metrics

5
Cited By
0.31
FWCI (Field Weighted Citation Impact)
14
Refs
0.57
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Multimodal Machine Learning Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Music and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing
Advanced Image and Video Retrieval Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition

Related Documents

JOURNAL ARTICLE

Deep Cross-Modal Image–Voice Retrieval in Remote Sensing

Yaxiong ChenXiaoqiang LuShuai Wang

Journal:   IEEE Transactions on Geoscience and Remote Sensing Year: 2020 Vol: 58 (10)Pages: 7049-7061
JOURNAL ARTICLE

Remote Sensing Cross-Modal Retrieval by Deep Image-Voice Hashing

Yichao ZhangXiangtao ZhengXiaoqiang Lu

Journal:   IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing Year: 2022 Vol: 15 Pages: 9327-9338
JOURNAL ARTICLE

Cross-Modal Remote Sensing Image Retrieval Via Intra- and Inter-Modal Feature Matching

Fanglong YaoNayu LiuPeiguang LiDongshuo YinChenglong LiuXian Sun

Journal:   IGARSS 2022 - 2022 IEEE International Geoscience and Remote Sensing Symposium Year: 2022 Pages: 1792-1795
© 2026 ScienceGate Book Chapters — All rights reserved.