Cross-modal Image-Text Retrieval with Multitask Learning

Junyu Luo; Ying Shen; Xiang Ao; Zhou Zhao; Min Yang

doi:10.1145/3357384.3358104

ScienceGate Book Chapters

JOURNAL ARTICLE

Cross-modal Image-Text Retrieval with Multitask Learning

Junyu Luo Ying Shen Xiang Ao Zhou Zhao Min Yang

Year: 2019 Pages: 2309-2312

DOI: 10.1145/3357384.3358104

Get Full-Text PDF Get Analytical Report

Abstract

In this paper, we propose a multi-task learning approach for cross-modal image-text retrieval. First, a correlation network is proposed for relation recognition task, which helps learn the complicated relations and common information of different modalities. Then, we propose a correspondence cross-modal autoencoder for cross-modal input reconstruction task, which helps correlate the hidden representations of two uni-modal autoencoders. In addition, to further improve the performance of cross-modal retrieval, two regularization terms (variance and consistency constraints) are introduced to the cross-modal embeddings such that the learned common information has large variance and is modality invariant. Finally, to enable large-scale cross-modal similarity search, a flexible binary transform network is designed to convert the text and image embeddings into binary codes. Extensive experiments on two benchmark datasets demonstrate that our model has robust superiority over the compared strong baseline methods. Source code is available at \urlhttps://github.com/daerv/DAEVR.

Keywords:

Computer science Modal Artificial intelligence Binary number Pattern recognition (psychology) Autoencoder Benchmark (surveying) Source code Binary code Machine learning Deep learning Mathematics

Metrics

Cited By

1.50

FWCI (Field Weighted Citation Impact)

Refs

0.86

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Multimodal Machine Learning Applications

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Advanced Image and Video Retrieval Techniques

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Domain Adaptation and Few-Shot Learning

Physical Sciences → Computer Science → Artificial Intelligence

Cross-modal Image-Text Retrieval with Multitask Learning

Abstract

Metrics

Citation History

Topics

Related Documents

Deep Learning-based Cross-Modal Image-Text Retrieval

A Cross-Modal Image-Text Retrieval System with Deep Learning

Improving Cross-Modal Image-Text Retrieval With Teacher-Student Learning

Image-text bidirectional learning network based cross-modal retrieval

Cross-Modal Image-Text Retrieval with Semantic Consistency