Learning Disentangled Representation for Cross-Modal Retrieval with Deep Mutual Information Estimation

Weikuo Guo; Huaibo Huang; Xiangwei Kong; Ran He

doi:10.1145/3343031.3351053

ScienceGate Book Chapters

JOURNAL ARTICLE

Learning Disentangled Representation for Cross-Modal Retrieval with Deep Mutual Information Estimation

Weikuo Guo Huaibo Huang Xiangwei Kong Ran He

Year: 2019 Pages: 1712-1720

DOI: 10.1145/3343031.3351053

Get Full-Text PDF Get Analytical Report

Abstract

Cross-modal retrieval has become a hot research topic in recent years for its theoretical and practical significance. This paper proposes a new technique for learning such deep visual-semantic embedding that is more effective and interpretable for cross-modal retrieval. The proposed method employs a two-stage strategy to fulfill the task. In the first stage, deep mutual information estimation is incorporated into the objective to maximize the mutual information between the input data and its embedding. In the second stage, an expelling branch is added to the network to disentangle the modality-exclusive information from the learned representations. This helps to reduce the impact of modality-exclusive information to the common subspace representation as well as improve the interpretability of the learned feature. Extensive experiments on two large-scale benchmark datasets demonstrate that our method can learn better visual-semantic embedding and achieve state-of-the-art cross-modal retrieval results.

Keywords:

Interpretability Computer science Artificial intelligence Modal Embedding Feature learning Representation (politics) Subspace topology Mutual information Benchmark (surveying) Modality (human–computer interaction) Feature (linguistics) Machine learning Deep learning Pointwise mutual information Pattern recognition (psychology) Information retrieval

Metrics

Cited By

1.60

FWCI (Field Weighted Citation Impact)

Refs

0.87

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Multimodal Machine Learning Applications

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Advanced Image and Video Retrieval Techniques

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Domain Adaptation and Few-Shot Learning

Physical Sciences → Computer Science → Artificial Intelligence

Learning Disentangled Representation for Cross-Modal Retrieval with Deep Mutual Information Estimation

Abstract

Metrics

Citation History

Topics

Related Documents

Cross-modal image retrieval with deep mutual information maximization

DRLHomo: Disentangled Representation Learning for Cross-Modal Homography Estimation

Variational Deep Representation Learning for Cross-Modal Retrieval

Learning Disentangled Representations via Mutual Information Estimation

Disentangled Speaker Representation Learning via Mutual Information Minimization