Multi-level Fusion of Multi-modal Semantic Embeddings for Zero Shot Learning

Zhe Kong; Xin Wang; Neng Gao; Yifei Zhang; Yuhan Liu; Chenyang Tu

doi:10.1145/3536221.3556575

ScienceGate Book Chapters

JOURNAL ARTICLE

Multi-level Fusion of Multi-modal Semantic Embeddings for Zero Shot Learning

Zhe Kong Xin Wang Neng Gao Yifei Zhang Yuhan Liu Chenyang Tu

Year: 2022 Pages: 310-318

DOI: 10.1145/3536221.3556575

Get Full-Text PDF Get Analytical Report

Abstract

Zero shot learning aims to recognize objects whose instances may not be covered by the training data. To generalize knowledge from seen classes to the novel ones, semantic space is built to embed knowledge from various views into multi-modal semantic embeddings. Existing semantic embeddings neglect the relationships between classes which are essential to transfer knowledge between classes. Moreover, existing zero shot learning models ignore the complementarity between semantic embeddings from different modalities. To tackle these problems, in this work, we resort to graph theory to explicitly model the interdependence between classes and then obtain new modal semantic embeddings. Furthermore, we pioneer to propose a multi-level fusion model to effectively combine knowledge encoded in multi-modal semantic embeddings together. By the virtue of subsequent fusion block, the results of multi-level fusion can be furtherly enriched and fused. Experiments show that our model could achieve promising results on various datasets. Ablation study suggests that our method is well suited for zero shot learning.

Keywords:

Computer science Modal Artificial intelligence Complementarity (molecular biology) Semantic memory Natural language processing Graph Zero (linguistics) Theoretical computer science Cognition

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

0.11

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Topics

Domain Adaptation and Few-Shot Learning

Physical Sciences → Computer Science → Artificial Intelligence

Multimodal Machine Learning Applications

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Infrastructure Maintenance and Monitoring

Physical Sciences → Engineering → Civil and Structural Engineering

Multi-level Fusion of Multi-modal Semantic Embeddings for Zero Shot Learning

Abstract

Metrics

Topics

Related Documents

Low-rank Multi-modal Features Fusion Semantic Autoencoder For Zero-shot Learning

Multi-Stage Semantic Graph Embeddings for Compositional Zero-Shot Learning

MFF: Multi-modal feature fusion for zero-shot learning

Semantic Communications with Zero-Shot Learning for Multi-modal Transmission

Depth-Aware Multi-Modal Fusion for Generalized Zero-Shot Learning