JOURNAL ARTICLE

Multi-level Fusion of Multi-modal Semantic Embeddings for Zero Shot Learning

Abstract

Zero shot learning aims to recognize objects whose instances may not be covered by the training data. To generalize knowledge from seen classes to the novel ones, semantic space is built to embed knowledge from various views into multi-modal semantic embeddings. Existing semantic embeddings neglect the relationships between classes which are essential to transfer knowledge between classes. Moreover, existing zero shot learning models ignore the complementarity between semantic embeddings from different modalities. To tackle these problems, in this work, we resort to graph theory to explicitly model the interdependence between classes and then obtain new modal semantic embeddings. Furthermore, we pioneer to propose a multi-level fusion model to effectively combine knowledge encoded in multi-modal semantic embeddings together. By the virtue of subsequent fusion block, the results of multi-level fusion can be furtherly enriched and fused. Experiments show that our model could achieve promising results on various datasets. Ablation study suggests that our method is well suited for zero shot learning.

Keywords:
Computer science Modal Artificial intelligence Complementarity (molecular biology) Semantic memory Natural language processing Graph Zero (linguistics) Theoretical computer science Cognition

Metrics

0
Cited By
0.00
FWCI (Field Weighted Citation Impact)
18
Refs
0.11
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Topics

Domain Adaptation and Few-Shot Learning
Physical Sciences →  Computer Science →  Artificial Intelligence
Multimodal Machine Learning Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Infrastructure Maintenance and Monitoring
Physical Sciences →  Engineering →  Civil and Structural Engineering
© 2026 ScienceGate Book Chapters — All rights reserved.