JOURNAL ARTICLE

Deep Cross-Modal Retrieval for Remote Sensing Image and Audio

Abstract

<p>Remote sensing image retrieval has many important applications in civilian and military fields, such as disaster monitoring and target detecting. However, the existing research on image retrieval, mainly including to two directions, text based and content based, cannot meet the rapid and convenient needs of some special applications and emergency scenes. Based on text, the retrieval is limited by keyboard inputting because of its lower efficiency for some urgent situations and based on content, it needs an example image as reference, which usually does not exist. Yet speech, as a direct, natural and efficient human-machine interactive way, can make up these shortcomings. Hence, a novel cross-modal retrieval method for remote sensing image and spoken audio is proposed in this paper. We first build a large-scale remote sensing image dataset with plenty of manual annotated spoken audio captions for the cross-modal retrieval task. Then a Deep Visual-Audio Network is designed to directly learn the correspondence of image and audio. And this model integrates feature extracting and multi-modal learning into the same network. Experiments on the proposed dataset verify the effectiveness of our approach and prove that it is feasible for speech-to-image retrieval. ? 2018 IEEE.</p>

Keywords:
Computer science Image retrieval Modal Task (project management) Artificial intelligence Feature (linguistics) Deep learning Image (mathematics) Information retrieval Feature extraction Visual Word Computer vision

Metrics

55
Cited By
2.60
FWCI (Field Weighted Citation Impact)
35
Refs
0.90
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Advanced Image and Video Retrieval Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Multimodal Machine Learning Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Image Retrieval and Classification Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition

Related Documents

JOURNAL ARTICLE

Deep Cross-Modal Image–Voice Retrieval in Remote Sensing

Yaxiong ChenXiaoqiang LuShuai Wang

Journal:   IEEE Transactions on Geoscience and Remote Sensing Year: 2020 Vol: 58 (10)Pages: 7049-7061
JOURNAL ARTICLE

Remote Sensing Cross-Modal Retrieval by Deep Image-Voice Hashing

Yichao ZhangXiangtao ZhengXiaoqiang Lu

Journal:   IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing Year: 2022 Vol: 15 Pages: 9327-9338
JOURNAL ARTICLE

Cross-Modal Remote Sensing Image–Audio Retrieval With Adaptive Learning for Aligning Correlation

Jinghao HuangYaxiong ChenShengwu XiongXiaoqiang Lu

Journal:   IEEE Transactions on Geoscience and Remote Sensing Year: 2024 Vol: 62 Pages: 1-13
© 2026 ScienceGate Book Chapters — All rights reserved.