Cross-Modal Feature Fusion Retrieval for Remote Sensing Image-Voice Retrieval

Rui Yang; Yu Gu; Yu Liao; Huan Zhang; Yingzhi Sun; Shuang Wang; Biao Hou; Licheng Jiao; He Zhang

doi:10.1109/igarss47720.2021.9554533

ScienceGate Book Chapters

JOURNAL ARTICLE

Cross-Modal Feature Fusion Retrieval for Remote Sensing Image-Voice Retrieval

Rui Yang Yu Gu Yu Liao Huan Zhang Yingzhi Sun Shuang Wang Biao Hou Licheng Jiao He Zhang

Year: 2021 Pages: 2855-2858

DOI: 10.1109/igarss47720.2021.9554533

Get Full-Text PDF Get Analytical Report

Abstract

With the increasing popularity of remote sensing technology applications, some emergency scenarios require rapid retrieval of remote sensing images, such as earthquake rescue, etc. Due to the high efficiency of voice input, researchers have focused on cross-modal remote sensing image-voice retrieval methods. However, these methods have two major drawbacks: speech input lacks discrimination and the intra-modal semantic information is under used. To address these drawbacks, we propose a novel cross-modal feature fusion retrieval model. Our model provides a more optimized cross-modal common feature space than previous models and thus optimizes the retrieval performance. First, our model adds the extra textual keyword information to the audio feature for remote sensing image retrieval. Second, it introduces inter-modality adversarial learning and intra-modality semantic discrimination into the remote sensing image-voice retrieval task. We conducted experiments on two datasets modified from the UCM-Captions dataset and the Remote Sensing Image Caption Dataset. The experimental results show that our model outperforms state-of-the-art models in this task.

Keywords:

Computer science Feature (linguistics) Image retrieval Modal Modality (human–computer interaction) Artificial intelligence Feature vector Visual Word Task (project management) Pattern recognition (psychology) Information retrieval Computer vision Image (mathematics)

Metrics

Cited By

0.31

FWCI (Field Weighted Citation Impact)

Refs

0.57

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Multimodal Machine Learning Applications

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Music and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Advanced Image and Video Retrieval Techniques

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Cross-Modal Feature Fusion Retrieval for Remote Sensing Image-Voice Retrieval

Abstract

Metrics

Citation History

Topics

Related Documents

Deep Cross-Modal Image–Voice Retrieval in Remote Sensing

Remote Sensing Cross-Modal Retrieval by Deep Image-Voice Hashing

Remote Sensing Cross-Modal Text-Image Retrieval Using Multi-Grade Dynamic Feature Fusion

Cross-Modal Remote Sensing Image Retrieval Via Intra- and Inter-Modal Feature Matching

Attention-Driven Cross-Modal Remote Sensing Image Retrieval