Zero-shot learning (ZSL) explores the way transfer the seen classes knowledge to the unseen classes. However, in generalized zero-shot learning (GZSL) scenario, the test set includes unseen and seen class data, leading to the phenomenon that classification results will be more inclined to the seen class. To address this issue, generative-based approaches have shown strong performance. However, generative-based methods adopting GAN or VAE's models suffer from training instability in balancing their network components, degrading the sampling performance. Therefore, to generate more realistic unseen class samples, we give a new generative-based method, Diffusion Generalized Zero-Shot Learning (DiffGZSL), which is interpretable and more stable to optimize for CZSL and GZSL. Specifically, DiffGZSL adopts the diffusion model as the generative network in zero-shot learning. In addition, given that semantic information is the key to ensuring knowledge transfer from seen to unseen classes, both category and attribute guidance are incorporated to the sampling process to generate discriminative visual features. In practice, DiffGZSL shows excellent and competitive performance on four ZSL benchmark datasets against recent popular models in CZSL and GZSL cases. Besides the ablation and analysis experiment results show that our conditional guided sampling method which utilizes categories and attributes is effective.
Xuan LiuYaoqin XieChenbin LiuJun ChengSonghui DiaoShan TanXiaokun Liang
Weihang RanWei YuanRyosuke Shibasaki
Dongnan GuiKai ChenHaisong DingQiang Huo