JOURNAL ARTICLE

Instance-Level Semantic Alignment for Zero-Shot Cross-Modal Retrieval

Kai WangYifan WangXing XuZuo CaoXunliang Cai

Year: 2022 Journal:   2022 IEEE International Conference on Multimedia and Expo (ICME) Pages: 1-6

Abstract

Zero-shot Cross-Modal Retrieval (ZS-CMR) is challenging due to the heterogeneous distributions across different modalities and the inconsistent semantics across seen and unseen classes. Previous methods usually perform class-level semantic alignment of data from different modalities by introducing auxiliary word embeddings of class labels, which have a fatal limitation as the learning of class-level information will lead to the ignorance of intra-modal variance. To solve this problem, we propose our Instance-Level Semantic Alignment (ILSA) method to make full use of the instance-level information. We use two disentanglement variational auto-encoders to decompose the data from two modalities into modal specific and modal invariant features. With an instance-level semantic features extractor and a distribution generator, ILSA could generate more appropriate distributions by the learned instance-level semantic features, without any auxiliary knowledge. We perform the experiment on six widely used datasets on two scenarios of ZS-CMR, the results show that our method establishes the new state-of-the-art performance on all datasets.

Keywords:
Computer science Modal Artificial intelligence Semantics (computer science) Invariant (physics) Class (philosophy) Generator (circuit theory) Natural language processing Pattern recognition (psychology) Mathematics

Metrics

6
Cited By
0.41
FWCI (Field Weighted Citation Impact)
24
Refs
0.68
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Advanced Image and Video Retrieval Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Multimodal Machine Learning Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Domain Adaptation and Few-Shot Learning
Physical Sciences →  Computer Science →  Artificial Intelligence

Related Documents

JOURNAL ARTICLE

Fine-Grained Alignment Network for Zero-Shot Cross-Modal Retrieval

Shiping GeZhiwei JiangYafeng YinCong WangZifeng ChengQing Gu

Journal:   ACM Transactions on Multimedia Computing Communications and Applications Year: 2025 Vol: 21 (10)Pages: 1-24
JOURNAL ARTICLE

Generalized Zero-Shot Cross-Modal Retrieval

Titir DuttaSoma Biswas

Journal:   IEEE Transactions on Image Processing Year: 2019 Vol: 28 (12)Pages: 5953-5962
JOURNAL ARTICLE

Progressive Cross-Modal Semantic Network for Zero-Shot Sketch-Based Image Retrieval

Cheng DengXinxun XuHao WangMuli YangDacheng Tao

Journal:   IEEE Transactions on Image Processing Year: 2020 Vol: 29 Pages: 8892-8902
© 2026 ScienceGate Book Chapters — All rights reserved.