Instance-Level Semantic Alignment for Zero-Shot Cross-Modal Retrieval

Kai Wang; Yifan Wang; Xing Xu; Zuo Cao; Xunliang Cai

doi:10.1109/icme52920.2022.9860026

ScienceGate Book Chapters

JOURNAL ARTICLE

Instance-Level Semantic Alignment for Zero-Shot Cross-Modal Retrieval

Kai Wang Yifan Wang Xing Xu Zuo Cao Xunliang Cai

Year: 2022 Journal: 2022 IEEE International Conference on Multimedia and Expo (ICME) Pages: 1-6

DOI: 10.1109/icme52920.2022.9860026

Get Full-Text PDF Get Analytical Report

Abstract

Zero-shot Cross-Modal Retrieval (ZS-CMR) is challenging due to the heterogeneous distributions across different modalities and the inconsistent semantics across seen and unseen classes. Previous methods usually perform class-level semantic alignment of data from different modalities by introducing auxiliary word embeddings of class labels, which have a fatal limitation as the learning of class-level information will lead to the ignorance of intra-modal variance. To solve this problem, we propose our Instance-Level Semantic Alignment (ILSA) method to make full use of the instance-level information. We use two disentanglement variational auto-encoders to decompose the data from two modalities into modal specific and modal invariant features. With an instance-level semantic features extractor and a distribution generator, ILSA could generate more appropriate distributions by the learned instance-level semantic features, without any auxiliary knowledge. We perform the experiment on six widely used datasets on two scenarios of ZS-CMR, the results show that our method establishes the new state-of-the-art performance on all datasets.

Keywords:

Computer science Modal Artificial intelligence Semantics (computer science) Invariant (physics) Class (philosophy) Generator (circuit theory) Natural language processing Pattern recognition (psychology) Mathematics

Metrics

Cited By

0.41

FWCI (Field Weighted Citation Impact)

Refs

0.68

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Advanced Image and Video Retrieval Techniques

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Multimodal Machine Learning Applications

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Domain Adaptation and Few-Shot Learning

Physical Sciences → Computer Science → Artificial Intelligence

Instance-Level Semantic Alignment for Zero-Shot Cross-Modal Retrieval

Abstract

Metrics

Citation History

Topics

Related Documents

Fine-Grained Alignment Network for Zero-Shot Cross-Modal Retrieval

Generalized Zero-Shot Cross-Modal Retrieval

Correlated Features Synthesis and Alignment for Zero-shot Cross-modal Retrieval

Semantic-Adversarial Graph Convolutional Network for Zero-Shot Cross-Modal Retrieval

Progressive Cross-Modal Semantic Network for Zero-Shot Sketch-Based Image Retrieval