Learning Cross-Modal Aligned Representation With Graph Embedding

Youcai Zhang; Jiayan Cao; Xiaodong Gu

doi:10.1109/access.2018.2881997

ScienceGate Book Chapters

JOURNAL ARTICLE

Learning Cross-Modal Aligned Representation With Graph Embedding

Youcai Zhang Jiayan Cao Xiaodong Gu

Year: 2018 Journal: IEEE Access Vol: 6 Pages: 77321-77333 Publisher: Institute of Electrical and Electronics Engineers

DOI: 10.1109/access.2018.2881997

Get Full-Text PDF Get Analytical Report

Abstract

The main task of cross-modal analysis is to learn discriminative representation shared across different modalities. In order to pursue aligned representation, conventional approaches tend to construct and optimize a linear projection or train a complex architecture of deep layers, yet it is difficult to compromise between accuracy and efficiency on modeling multimodal data. This paper proposes a novel graph-embedding learning framework implemented by neural networks. The learned embedding directly approximates the cross-modal aligned representation to perform cross-modal retrieval and image classification combining text information. Proposed framework extracts learned representation from a graph model and, simultaneously, trains a classifier under semi-supervised settings. For optimization, unlike previous methods based on the graph Laplacian regularization, a sampling strategy is adopted to generate training pairs to fully explore the inter-modal and intra-modal similarity relationship. Experimental results on various datasets show that the proposed framework outperforms other state-of-the-art methods on crossmodal retrieval. The framework also demonstrates convincing improvements on the new issue of image classification combining text information on Wiki dataset.

Keywords:

Computer science Artificial intelligence Discriminative model Modal Graph embedding Embedding Classifier (UML) Feature learning Pattern recognition (psychology) Graph Machine learning Crossmodal Theoretical computer science

Metrics

Cited By

0.29

FWCI (Field Weighted Citation Impact)

Refs

0.57

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Multimodal Machine Learning Applications

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Advanced Image and Video Retrieval Techniques

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Domain Adaptation and Few-Shot Learning

Physical Sciences → Computer Science → Artificial Intelligence

Learning Cross-Modal Aligned Representation With Graph Embedding

Abstract

Metrics

Citation History

Topics

Related Documents

Cross-modal Metric Learning with Graph Embedding

Graph Embedding Contrastive Multi-Modal Representation Learning for Clustering

Graph Embedding Learning for Cross-Modal Information Retrieval

Cross-Modal Retrieval with Heterogeneous Graph Embedding

Dynamic scale position embedding for cross-modal representation learning