JOURNAL ARTICLE

Cross-modal Metric Learning with Graph Embedding

Abstract

Metric learning with neural networks has exhibited promising improvements in representation learning. Yet cross-modal retrieval poses a unique challenge to metric learning: how to compute the distance across different modalities such as image and text. Existing neural network based methods tend to establish two branches for images and texts respectively to bridge the modal gap. Also, most of them cannot fully exploit the structure embedded in the multimodal data. This paper introduces embedding layer to provide cross-modal shared representation with non-linearity and reformulates the cross-modal retrieval problem as a graph embedding problem by constructing a multimodal graph. To learn the graph embedding, training pairs and triplets are uniformly generated from random walk sequences on the graph. Then graph pair and triplet constraints are imposed on the embedding layer for structure preservation. Meanwhile, a classifier is trained with labeled data to ensure the learned embedding is coupled with semantic information. For optimization, graph pair and triplet constraints are integrated into a unified multi-task learning with the supervised classifier. Experimental results on the Wiki and NUS-WIDE datasets demonstrate the effectiveness and superiority of the learned embedding for cross-modal retrieval.

Keywords:
Embedding Computer science Classifier (UML) Graph embedding Theoretical computer science Modal Artificial intelligence Graph Feature learning Machine learning Exploit Pattern recognition (psychology)

Metrics

0
Cited By
0.00
FWCI (Field Weighted Citation Impact)
49
Refs
0.12
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Topics

Multimodal Machine Learning Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Advanced Image and Video Retrieval Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Human Pose and Action Recognition
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition

Related Documents

JOURNAL ARTICLE

Learning Cross-Modal Aligned Representation With Graph Embedding

Youcai ZhangJiayan CaoXiaodong Gu

Journal:   IEEE Access Year: 2018 Vol: 6 Pages: 77321-77333
JOURNAL ARTICLE

Graph Embedding with Similarity Metric Learning

Tao TaoQianqian WangYue RuanXue LiXiujun Wang

Journal:   Symmetry Year: 2023 Vol: 15 (8)Pages: 1618-1618
BOOK-CHAPTER

Graph Embedding Learning for Cross-Modal Information Retrieval

Youcai ZhangXiaodong Gu

Lecture notes in computer science Year: 2017 Pages: 594-601
JOURNAL ARTICLE

Cross-Modal Retrieval with Heterogeneous Graph Embedding

Dapeng ChenMin WangHaobin ChenLin WuJing QinWei Peng

Journal:   Proceedings of the 30th ACM International Conference on Multimedia Year: 2022 Pages: 3291-3300
BOOK-CHAPTER

Improving Supervised Cross-modal Retrieval with Semantic Graph Embedding

Changting FengDagang LiJingwei Zheng

Lecture notes in computer science Year: 2021 Pages: 187-199
© 2026 ScienceGate Book Chapters — All rights reserved.