JOURNAL ARTICLE

Cross-modal Image-Text Retrieval with Multitask Learning

Abstract

In this paper, we propose a multi-task learning approach for cross-modal image-text retrieval. First, a correlation network is proposed for relation recognition task, which helps learn the complicated relations and common information of different modalities. Then, we propose a correspondence cross-modal autoencoder for cross-modal input reconstruction task, which helps correlate the hidden representations of two uni-modal autoencoders. In addition, to further improve the performance of cross-modal retrieval, two regularization terms (variance and consistency constraints) are introduced to the cross-modal embeddings such that the learned common information has large variance and is modality invariant. Finally, to enable large-scale cross-modal similarity search, a flexible binary transform network is designed to convert the text and image embeddings into binary codes. Extensive experiments on two benchmark datasets demonstrate that our model has robust superiority over the compared strong baseline methods. Source code is available at \urlhttps://github.com/daerv/DAEVR.

Keywords:
Computer science Modal Artificial intelligence Binary number Pattern recognition (psychology) Autoencoder Benchmark (surveying) Source code Binary code Machine learning Deep learning Mathematics

Metrics

22
Cited By
1.50
FWCI (Field Weighted Citation Impact)
15
Refs
0.86
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Multimodal Machine Learning Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Advanced Image and Video Retrieval Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Domain Adaptation and Few-Shot Learning
Physical Sciences →  Computer Science →  Artificial Intelligence

Related Documents

JOURNAL ARTICLE

Deep Learning-based Cross-Modal Image-Text Retrieval

Songxuan Li

Year: 2025 Pages: 448-452
BOOK-CHAPTER

A Cross-Modal Image-Text Retrieval System with Deep Learning

Shuang LiuHan QiaoQingzhen Xu

Communications in computer and information science Year: 2021 Pages: 538-548
JOURNAL ARTICLE

Improving Cross-Modal Image-Text Retrieval With Teacher-Student Learning

Junhao LiuMin YangChengming LiRuifeng Xu

Journal:   IEEE Transactions on Circuits and Systems for Video Technology Year: 2020 Vol: 31 (8)Pages: 3242-3253
JOURNAL ARTICLE

Image-text bidirectional learning network based cross-modal retrieval

Zhuoyi LiHuibin LuHao FuGuanghua Gu

Journal:   Neurocomputing Year: 2022 Vol: 483 Pages: 148-159
© 2026 ScienceGate Book Chapters — All rights reserved.