JOURNAL ARTICLE

Learning Cross-Modal Aligned Representation With Graph Embedding

Youcai ZhangJiayan CaoXiaodong Gu

Year: 2018 Journal:   IEEE Access Vol: 6 Pages: 77321-77333   Publisher: Institute of Electrical and Electronics Engineers

Abstract

The main task of cross-modal analysis is to learn discriminative representation shared across different modalities. In order to pursue aligned representation, conventional approaches tend to construct and optimize a linear projection or train a complex architecture of deep layers, yet it is difficult to compromise between accuracy and efficiency on modeling multimodal data. This paper proposes a novel graph-embedding learning framework implemented by neural networks. The learned embedding directly approximates the cross-modal aligned representation to perform cross-modal retrieval and image classification combining text information. Proposed framework extracts learned representation from a graph model and, simultaneously, trains a classifier under semi-supervised settings. For optimization, unlike previous methods based on the graph Laplacian regularization, a sampling strategy is adopted to generate training pairs to fully explore the inter-modal and intra-modal similarity relationship. Experimental results on various datasets show that the proposed framework outperforms other state-of-the-art methods on crossmodal retrieval. The framework also demonstrates convincing improvements on the new issue of image classification combining text information on Wiki dataset.

Keywords:
Computer science Artificial intelligence Discriminative model Modal Graph embedding Embedding Classifier (UML) Feature learning Pattern recognition (psychology) Graph Machine learning Crossmodal Theoretical computer science

Metrics

2
Cited By
0.29
FWCI (Field Weighted Citation Impact)
65
Refs
0.57
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Multimodal Machine Learning Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Advanced Image and Video Retrieval Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Domain Adaptation and Few-Shot Learning
Physical Sciences →  Computer Science →  Artificial Intelligence

Related Documents

JOURNAL ARTICLE

Graph Embedding Contrastive Multi-Modal Representation Learning for Clustering

Wei XiaTianxiu WangQuanxue GaoMing YangXinbo Gao

Journal:   IEEE Transactions on Image Processing Year: 2023 Vol: 32 Pages: 1170-1183
BOOK-CHAPTER

Graph Embedding Learning for Cross-Modal Information Retrieval

Youcai ZhangXiaodong Gu

Lecture notes in computer science Year: 2017 Pages: 594-601
JOURNAL ARTICLE

Cross-Modal Retrieval with Heterogeneous Graph Embedding

Dapeng ChenMin WangHaobin ChenLin WuJing QinWei Peng

Journal:   Proceedings of the 30th ACM International Conference on Multimedia Year: 2022 Pages: 3291-3300
JOURNAL ARTICLE

Dynamic scale position embedding for cross-modal representation learning

Jungkyoo ShinSeong-Min KangYoon-Sik ChoEunwoo Kim

Journal:   Neural Networks Year: 2025 Vol: 193 Pages: 108087-108087
© 2026 ScienceGate Book Chapters — All rights reserved.